[nycbug-talk] Parallel Virtual Machine
eksffa at freebsdbrasil.com.br
Wed Apr 12 12:00:56 EDT 2006
I worked with PVM in the past, while I was at the university. We rebuilt
a number of sequential Washington University's genetical sequencing
tools used in Brazilian Genome efforts at UNESP (SP University).
It is not easy, but it has potential for great results. On the other
hand the results are directly related to the ability of the developers
stop thinking sequencially and start doing it parallel. The main focus
on the code analisys is regarding "granularity". The most "granular"
specs of the code are the ones which do need huge amount of CPU cycles
to do a given task and this task is potentially done a number of times,
the same way. "for" and "while" loops which takes much processing time
for each loop instance are the main target of code to be PVM'd.
On the other hand PVMing the wrong chunck of code may result in message
passing on the network for small processing goals, which will certainly
led the system to underpeform. It is a very difficult task to optmize a
system to work with PVM and it completly source code related. The
developer must be completly aware of the pvm send and get calls
everywhere and how to keep track of the "control" of the parallel
processes in a few number of programs and specially in a small number of
machines (not paralelizing the controls).
This is probably why many people have their choice on going in the
"grid" path or other message passing solutions such as MPI
(ports/net/mpich), since they don't deeply depend on rewriting the code
to make it parallel.
That said, and specially the "non-easy" aspect considered on PVMing a
system, the PVM framework is excellent. The PVM shell allows one in the
master node to completly control and monitor what is going on the
cluster, one can add or delete new machines on the PVM cluster at any
time, and the interesting stuff is, since PVM-aware programs need to be
fully available in all nodes on the PVM enviroment, it may run with
modified priority, and, PVM is ready for most of the know archs and
operating systems. It means you can have a completly heteronegeous
enviroment with, say, SGI, BSD, Linux and Windows machines, part of the
PVM cluster. And they may be considered higher or lower
relevance/priority in the cluster, canse where for example, you can have
machines doing other jobs in a certain time period, and not need to get
out of the cluster (less pvm jobs can be assigned to this machine) and
later assume a higher prio (more pvm jobs). PVM lib will control it.
In our case in the University we had SPARC with Solaris and i386 BSD
machines dedicated to the processing, but other machines running Windows
or Mac OS (some front end programs to "map" the results of the genome
processing data were GUI to run only in windows or Mac) were not
dedicated. So when they were doing "other stuff" they were only assumed
to be lesser prio in the PVM enviroment, so less jobs were assigned to
'em. When they were idle, we put 'em in their usual prio which would
take full adavantages of its participation on the cluster.
Also, I must say that PVM people are very attentious. We had direct help
from James Arthur Kohl himself back in 2001-2003 years, when we had
problems on parallelizing. Sometimes he contributed with code analisys,
other turns sent some precious docs which discussed problems similar to
ours - which are all available on the ORNL (Oak Ridge National Lab) web
site. After I graduated in 2004 I did not have more contact with PVM. In
fact I had my attention grab for MPI better than PVM, but I must say PVM
is an exausting experience, but which takes to potentially better
results than any other parallel processing approach I am aware of.
FreeBSD Brasil LTDA.
(31) 3281-9633 / 3281-3547
316601 at sip.freebsdbrasil.com.br
"Long live Hanin Elias, Kim Deal!"
More information about the talk