[nycbug-talk] Parallel Virtual Machine

Wed Apr 12 12:00:56 EDT 2006

I worked with PVM in the past, while I was at the university. We rebuilt 
a number of sequential Washington University's genetical sequencing 
tools used in Brazilian Genome efforts at UNESP (SP University).

It is not easy, but it has potential for great results. On the other 
hand the results are directly related to the ability of the developers 
stop thinking sequencially and start doing it parallel. The main focus 
on the code analisys is regarding "granularity". The most "granular" 
specs of the code are the ones which do need huge amount of CPU cycles 
to do a given task and this task is potentially done a number of times, 
the same way. "for" and "while" loops which takes much processing time 
for each loop instance are the main target of code to be PVM'd.

On the other hand PVMing the wrong chunck of code may result in message 
passing on the network for small processing goals, which will certainly 
led the system to underpeform. It is a very difficult task to optmize a 
system to work with PVM and it completly source code related. The 
developer must be completly aware of the pvm send and get calls 
everywhere and how to keep track of the "control" of the parallel 
processes in a few number of programs and specially in a small number of 
machines (not paralelizing the controls).

This is probably why many people have their choice on going in the 
"grid" path or other message passing solutions such as MPI 
(ports/net/mpich), since they don't deeply depend on rewriting the code 
to make it parallel.

That said, and specially the "non-easy" aspect considered on PVMing a 
system, the PVM framework is excellent. The PVM shell allows one in the 
master node to completly control and monitor what is going on the 
cluster, one can add or delete new machines on the PVM cluster at any 
time, and the interesting stuff is, since PVM-aware programs need to be 
fully available in all nodes on the PVM enviroment, it may run with 
modified priority, and, PVM is ready for most of the know archs and 
operating systems. It means you can have a completly heteronegeous 
enviroment with, say, SGI, BSD, Linux and Windows machines, part of the 
PVM cluster. And they may be considered higher or lower 
relevance/priority in the cluster, canse where for example, you can have 
machines doing other jobs in a certain time period, and not need to get 
out of the cluster (less pvm jobs can be assigned to this machine) and 
later assume a higher prio (more pvm jobs). PVM lib will control it.

In our case in the University we had SPARC with Solaris and i386 BSD 
machines dedicated to the processing, but other machines running Windows 
or Mac OS (some front end programs to "map" the results of the genome 
processing data were GUI to run only in windows or Mac) were not 
dedicated. So when they were doing "other stuff" they were only assumed 
to be lesser prio in the PVM enviroment, so less jobs were assigned to 
'em. When they were idle, we put 'em in their usual prio which would 
take full adavantages of its participation on the cluster.

Also, I must say that PVM people are very attentious. We had direct help 
from James Arthur Kohl himself back in 2001-2003 years, when we had 
problems on parallelizing. Sometimes he contributed with code analisys, 
other turns sent some precious docs which discussed problems similar to 
ours - which are all available on the ORNL (Oak Ridge National Lab) web 
site. After I graduated in 2004 I did not have more contact with PVM. In 
fact I had my attention grab for MPI better than PVM, but I must say PVM 
is an exausting experience, but which takes to potentially better 
results than any other parallel processing approach I am aware of.

-- 
Patrick Tracanelli

FreeBSD Brasil LTDA.
(31) 3281-9633 / 3281-3547
316601 at sip.freebsdbrasil.com.br
http://www.freebsdbrasil.com.br
"Long live Hanin Elias, Kim Deal!"