Print
Hits: 22386

Pass The Messages Please

The idea of a homemade parallel supercomputer predates the actual Beowulf project by years if not decades. In this column (and the next), we explore "the" message passing library that began it all and learn some important lessons that extend our knowledge of parallelism and scaling. We will do "real" parallel computing,using the message passing library that made the creation of Beowulf-style compute clusters possible: PVM. PVM stands for "Parallel Virtual Machine", and that's just what this library does -- it takes a collection of workstations or computers of pretty much any sort on a TCP/IP network and lets you "glue" them together into a parallel supercomputer.

About PVM

Although for many people PVM has been superseded by MPI (the Message Passing Interface library) for reasons of portability to and from "bigiron" supercomputers, to me PVM still has a great deal of appeal. It is simple. It is roughly twelve years old and hence reasonably mature. It can be run by any user with no particular privileges. It has a nifty GUI which helps novices"visualize" its operation and which can help even experts debug certain failures.

PVM was developed at Oak Ridge by a team that contains several people that I personally revere as some of the most farsighted computer scientists alive today. Back in 1991 and 1992 they toured the country giving talks where they showed how they were able to glue collections of Sun workstations, DEC workstations, and even Cray supercomputers all together into a parallel supercomputer that on certain classes of tasks scaled performance nearly linearly with the number of nodes. Back in 1992 I heard one of those talks, and shortly thereafter I was using PVM 2.4.0 and then 3.0.0 on my own Monte Carlo work, which I had "parallelized" up to that point by just running a lot of copies on different Sun workstations on our departmental LAN. It is safe to say that without PVM there would have been no Beowulf project, and with no Beowulf project even now high performance computing would likely be dominated by supercomputer vendors at several million dollars a pop.

On the other hand, MPI was originally developed by a consortium of supercomputer vendors under duress. The government had observed much money being pi--, um, "wasted" away by labs who bought multimillion dollar supercomputers, spent years porting code to their specialized parallel interface, and then had to do it all over again when they upgraded to a new supercomputer after the vendor who sold them the first one went out of business or changed their interface. They basically told the vendors to come up with a portable standard parallel interface or lose the right to bid on government contracts, and after some teeth-gnashing over the loss of all that tremendous arm-twisting customer lock-in power, that's just what the vendors did.

At this point, MPI is perhaps a "better" interface for many people to learn, but an informal poll of friends who are old hands at beowulfish supercomputing (all three or four of them) has revealed to me that PVM is still by far the interface of choice among at least this crowd. For whatever reason you like: beginning at the beginning, my personal preference, some sort of crazed view of political correctness, the fact that there is already a regular MPI column on this site, we're going to start with PVM in this column.

For those who are just joining us, a brief review of where we are. So far, this column has indicated how to build the simplest sort of cluster-- a NOW (Network Of Workstations). It has introduced a perl script(taskmaster) that can run a toy binary (task) that generates random numbers in parallel across a small cluster, collects the results, and displays them. We have used taskmaster and task to develop a rudimentary understanding of Amdahl's Law and related parallel performance scaling relations.

If you want to play along with the rest of us in the next few columns, you will need a small NOW or other cluster of Linux/Unix based computers upon which you have an account and login privileges. Ideally the cluster or LAN will be set up so that your home directory is NFS mounted on all the systems that will be your "compute nodes" (which can be simple, network accessible workstations and can even be in lightweight use at the time by other people working at their respective consoles).

To use PVM this cluster will also need to have both PVM and ssh (SecureShell) installed on all the nodes. Your shell needs to be configured so that you can login to any node from any other node without a password. This is actually a daunting (but essential) step to many users seeking to set up PVM or MPI.

Our column next time will be on using PVM (in fact, to redo the same task featured in the first few columns but much more efficiently),but many users will have to install PVM as well in order to use it and set up ssh and their shell so it works. The remainder of the column will therefore focus on these initial steps of getting ssh working without a password (valuable for both PVM and MPI). Next time, we will jump into installing and using PVM. As was the custom in the previous columns, we will presume that you are using an RPM-based version of Linux such as Red Hat.

How PVM Works

As you can imagine, describing precisely how PVM does all of this in detail is far beyond the scope of an article. So consider this the nickel explanation; for the dollar explanation take a look at PVM -- A User's Guide and Tutorial for Networked Parallel Computing. See the Resources Sidebar for the PVM home page which has manuals,quick reference cards, tutorials and much more.

PVM is a collection of library routines and include files, supporting tools, and an architecture-dependent directory structure. Recall that PVM was designed to permit a virtual parallel computer to built out of systems that might well be different --running different operating systems on completely different hardware,for example). One can (with a bit of work) even glue Linux and Windows machines together into a supercomputer.

PVM provides a consistent and intuitive interface to the user for the following essential components of running a parallel task:

All of these tasks can be handled without PVM as the "taskmaster" perl script demonstrates, but we have to do quite a bit of work using complex tools to get anywhere close to the same simplicity and performance. PVM is a poor man's "operating system" for the virtual machine and provides far more information, control, robustness and efficiency than one is likely to develop on one's own.

PVM works by starting up a daemon, pvmd, on all the nodes. The nodes can be selected interactively using the PVM console or the xpvm GUI interface, or a cluster can be specified by putting node names in a hostfile (one per line) and running pvm hostfile. Nodes can also be started up and controlled within a PVM application (the only way to achieve truly robust operation).

It is only this step (starting the remote daemons) that requires a remote shell, which is why I prefer to use ssh rather than rsh in spite of its larger overhead. The benefits associated with greater security outweigh the (nearly irrelevant) "one time" cost of a few extra seconds starting up PVM on the cluster, although on many isolated clusters("true Beowulfs") one can run rsh instead if you prefer.

Each pvmd is given a cluster-unique id called a "tid" when it is started up by the original pvmd or PVM process; this permits each node to be uniquely identified and targeted for communication or other purposes during the computation. Note that even if the program to be run has no particular "master" task, there is a "master" pvmd that keeps track of all the nodes belonging to this virtual supercomputer. In this way there can be more than one virtual supercomputer running on different systems on the same LAN, belonging to the same or different users. PVM "locks" a computer to a particular virtual machine and user when it starts up (which can lead to certain problems like leftover lockfiles when it crashes).

Once the pvmd is running on all the nodes (locking those nodes into a single virtual machine) everything else PVM does in a typical parallel application can be summarized in one phrase: send and receive messages between tid-identified nodes. It hides all details of just how the messages are sent and received from the user. It hides all architecture-related differences from the user (e.g. big-endian vs little-endian numbers, for those who know what they are). It guarantees efficient and reliable transport and delivery of sequenced, tagged,messages. It does all this remarkably efficiently.

It really does even more, but before exploring all that it can do we have to get it installed and functional. The next two sections show you how.

SSH for PVM

If you are running a Red Hat based cluster that was professionally setup as a LAN with a NFS exported home directory, chances are very good that ssh is already installed and ready to run. You can easily check by issuing the following:

$rpm -qa|grep openshh
openssh-3.5p1-11
openssh-clients-3.5p1-11
openssh-server-3.5p1-11

If these packages (possibly with different version numbers and patch levels) are NOT already installed, you will need to install them from RPMs provided for your system (or get your systems administrator to install them for you). There may be a few more openssh packages installed as well depending on the needs of your site. The server package installs the ssh daemon (which listens on a port for incoming connections and services them). The ssh program itself is used to connect to remote hosts and is in the clients package along with other userspace ssh-based programs and utilities. Both the client and the server packages require the basic openssh package to function.

To install the required RPMs if they are not already installed, you will need to become root and enter something like:

#rpm -Uvh openssh-3.5.p1-11.i386.rpm

and repeat for the other missing rpm's in dependency order on each node of the cluster.

Once the daemon is installed, you may still need to configure it to start at every reboot and/or start it up for the current session. To find out, use chkconfig:

$chkconfig --list sshd
sshd   0:off  1:off  2:on  3:on  4:on  5:on  6:off

As you can see, this daemon is configured to start up at run levels 2-5. Read man chkconfig to see how to set up or alter this (sensible) base configuration. To get it running right after installing it you can either reboot or start it by hand:

#/etc/init.d/sshd start
Starting sshd:             [  OK  ]

Now, let's arrange it so that we can login to a remote host (also running sshd) without a password. Let's start by seeing if we can log into the remote host at all, with a password:

$ssh lilith
The authenticity of host 'lilith (192.168.1.131)' can't be established.
RSA key fingerprint is 8d:55:10:15:8b:6c:64:65:17:00:a7:84:a3:35:9f:f6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'lilith,192.168.1.131' (RSA) to the list of
known hosts.
rgb@lilith's password: 

So far, so good. Note that the FIRST time we remotely login, ssh will ask you to verify that the host you are connecting to is really that host. When you answer yes it will save its key fingerprint and use it thereafter to automatically verify that the host is who you think it is. This step is one small part of the ssh security benefit. However, we had to enter a password to login. This is no big deal for a single host, but is a BIG deal if you have to do it 1024 times on a big cluster just to get PVM started up!

To avoid this, we use the ssh-keygen command to generate a public/private ssh key pair of our very own:

$ssh-keygen -t rsa 
Generating public/private rsa key pair.  
Enter file in which to save the key (/home/rgb/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):  
Enter same passphrase again:  
Your identification has been saved in /home/rgb/.ssh/id_rsa.
Your public key has been saved in /home/rgb/.ssh/id_rsa.pub.  
The key fingerprint is:  c3:aa:6b:ba:35:57:95:aa:7b:45:48:94:c3:83:81:11 

This generates a default 1024 bit RSA key; alternatively we could have made a DSA key or increased or decreased the number of bits in the key(decreasing being a Bad Idea). Note that we used a blank passphrase;this will keep ssh from prompting us for a passphrase when we connect.

The last step is to create an authorized keys file in your ~/.sshdirectory. If your home directory is NFS exported to all the nodes,then you are done; otherwise you'll also need to copy the entire .sshdirectory to all the hosts that don't already have it mounted. The following illustrates the steps and a test.


$cp .ssh/id_rsa.pub .ssh/authorized_keys
$scp -r .ssh lilith
rgb@lilith's password: 
  progress bars
$ssh lilith

Note that with the last ssh we logged into lilith with no password!

ssh is really pretty easy to set up this way; if you read the manpag(s) you can learn how to generate and add additional authorized keys and do fancier things with it, but many users will need no more than what we've done so far. A warning - it is a good idea to log into each host in your cluster one time after setting it up before proceeding further, to build up the known_hosts file so that you aren't prompted for each host the first time PVM starts up a virtual machine. Go do that, and then next time we'll get PVM going.

Sidebar One: Projects Mentioned in Column
PVM Home Page

PVM Users Guide PVM: A User's Guide and Tutorial for Networked Parallel Computing, Geist, Beguelin, Dongarra, Jiang, Manchek and Sunderam (MIT press)

PVM Users Guide Online

SSH without Passwords by Arun Vasan

Openssh

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.

Robert Brown, Ph.D, is has written extensively about Linux clusters. You can find his work and much more on his home page