Article Index

Pass The Messages Please

The idea of a homemade parallel supercomputer predates the actual Beowulf project by years if not decades. In this column (and the next), we explore "the" message passing library that began it all and learn some important lessons that extend our knowledge of parallelism and scaling. We will do "real" parallel computing,using the message passing library that made the creation of Beowulf-style compute clusters possible: PVM. PVM stands for "Parallel Virtual Machine", and that's just what this library does -- it takes a collection of workstations or computers of pretty much any sort on a TCP/IP network and lets you "glue" them together into a parallel supercomputer.

About PVM

Although for many people PVM has been superseded by MPI (the Message Passing Interface library) for reasons of portability to and from "bigiron" supercomputers, to me PVM still has a great deal of appeal. It is simple. It is roughly twelve years old and hence reasonably mature. It can be run by any user with no particular privileges. It has a nifty GUI which helps novices"visualize" its operation and which can help even experts debug certain failures.

PVM was developed at Oak Ridge by a team that contains several people that I personally revere as some of the most farsighted computer scientists alive today. Back in 1991 and 1992 they toured the country giving talks where they showed how they were able to glue collections of Sun workstations, DEC workstations, and even Cray supercomputers all together into a parallel supercomputer that on certain classes of tasks scaled performance nearly linearly with the number of nodes. Back in 1992 I heard one of those talks, and shortly thereafter I was using PVM 2.4.0 and then 3.0.0 on my own Monte Carlo work, which I had "parallelized" up to that point by just running a lot of copies on different Sun workstations on our departmental LAN. It is safe to say that without PVM there would have been no Beowulf project, and with no Beowulf project even now high performance computing would likely be dominated by supercomputer vendors at several million dollars a pop.

On the other hand, MPI was originally developed by a consortium of supercomputer vendors under duress. The government had observed much money being pi--, um, "wasted" away by labs who bought multimillion dollar supercomputers, spent years porting code to their specialized parallel interface, and then had to do it all over again when they upgraded to a new supercomputer after the vendor who sold them the first one went out of business or changed their interface. They basically told the vendors to come up with a portable standard parallel interface or lose the right to bid on government contracts, and after some teeth-gnashing over the loss of all that tremendous arm-twisting customer lock-in power, that's just what the vendors did.

At this point, MPI is perhaps a "better" interface for many people to learn, but an informal poll of friends who are old hands at beowulfish supercomputing (all three or four of them) has revealed to me that PVM is still by far the interface of choice among at least this crowd. For whatever reason you like: beginning at the beginning, my personal preference, some sort of crazed view of political correctness, the fact that there is already a regular MPI column on this site, we're going to start with PVM in this column.

For those who are just joining us, a brief review of where we are. So far, this column has indicated how to build the simplest sort of cluster-- a NOW (Network Of Workstations). It has introduced a perl script(taskmaster) that can run a toy binary (task) that generates random numbers in parallel across a small cluster, collects the results, and displays them. We have used taskmaster and task to develop a rudimentary understanding of Amdahl's Law and related parallel performance scaling relations.

If you want to play along with the rest of us in the next few columns, you will need a small NOW or other cluster of Linux/Unix based computers upon which you have an account and login privileges. Ideally the cluster or LAN will be set up so that your home directory is NFS mounted on all the systems that will be your "compute nodes" (which can be simple, network accessible workstations and can even be in lightweight use at the time by other people working at their respective consoles).

To use PVM this cluster will also need to have both PVM and ssh (SecureShell) installed on all the nodes. Your shell needs to be configured so that you can login to any node from any other node without a password. This is actually a daunting (but essential) step to many users seeking to set up PVM or MPI.

Our column next time will be on using PVM (in fact, to redo the same task featured in the first few columns but much more efficiently),but many users will have to install PVM as well in order to use it and set up ssh and their shell so it works. The remainder of the column will therefore focus on these initial steps of getting ssh working without a password (valuable for both PVM and MPI). Next time, we will jump into installing and using PVM. As was the custom in the previous columns, we will presume that you are using an RPM-based version of Linux such as Red Hat.

How PVM Works

As you can imagine, describing precisely how PVM does all of this in detail is far beyond the scope of an article. So consider this the nickel explanation; for the dollar explanation take a look at PVM -- A User's Guide and Tutorial for Networked Parallel Computing. See the Resources Sidebar for the PVM home page which has manuals,quick reference cards, tutorials and much more.

PVM is a collection of library routines and include files, supporting tools, and an architecture-dependent directory structure. Recall that PVM was designed to permit a virtual parallel computer to built out of systems that might well be different --running different operating systems on completely different hardware,for example). One can (with a bit of work) even glue Linux and Windows machines together into a supercomputer.

PVM provides a consistent and intuitive interface to the user for the following essential components of running a parallel task:

  • Identifying a set of resources (compute nodes) and configuring them to run a parallel task efficiently.
  • Starting up a job to run in parallel on those nodes, handling all aspects of the execution of binaries on the actual nodes.
  • Providing a message passing library to facilitate sending and receiving messages between the nodes.

All of these tasks can be handled without PVM as the "taskmaster" perl script demonstrates, but we have to do quite a bit of work using complex tools to get anywhere close to the same simplicity and performance. PVM is a poor man's "operating system" for the virtual machine and provides far more information, control, robustness and efficiency than one is likely to develop on one's own.

PVM works by starting up a daemon, pvmd, on all the nodes. The nodes can be selected interactively using the PVM console or the xpvm GUI interface, or a cluster can be specified by putting node names in a hostfile (one per line) and running pvm hostfile. Nodes can also be started up and controlled within a PVM application (the only way to achieve truly robust operation).

It is only this step (starting the remote daemons) that requires a remote shell, which is why I prefer to use ssh rather than rsh in spite of its larger overhead. The benefits associated with greater security outweigh the (nearly irrelevant) "one time" cost of a few extra seconds starting up PVM on the cluster, although on many isolated clusters("true Beowulfs") one can run rsh instead if you prefer.

Each pvmd is given a cluster-unique id called a "tid" when it is started up by the original pvmd or PVM process; this permits each node to be uniquely identified and targeted for communication or other purposes during the computation. Note that even if the program to be run has no particular "master" task, there is a "master" pvmd that keeps track of all the nodes belonging to this virtual supercomputer. In this way there can be more than one virtual supercomputer running on different systems on the same LAN, belonging to the same or different users. PVM "locks" a computer to a particular virtual machine and user when it starts up (which can lead to certain problems like leftover lockfiles when it crashes).

Once the pvmd is running on all the nodes (locking those nodes into a single virtual machine) everything else PVM does in a typical parallel application can be summarized in one phrase: send and receive messages between tid-identified nodes. It hides all details of just how the messages are sent and received from the user. It hides all architecture-related differences from the user (e.g. big-endian vs little-endian numbers, for those who know what they are). It guarantees efficient and reliable transport and delivery of sequenced, tagged,messages. It does all this remarkably efficiently.

It really does even more, but before exploring all that it can do we have to get it installed and functional. The next two sections show you how.

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.