MPI: The Spawn of MPI

Article Index

There once was a man from Nantucket
Whose PVM code kicked the bucket
He ranted and raved
"Oh what can I save?"
Then he re-wrote his application MPI and used MPI_COMM_SPAWN and his life became fundamentally better.

The Story So Far

One of the big additions in MPI-2 is the concept of dynamic processes. However, early uses of it were rather mundane and, truth be told, unnecessary. Indeed, dynamic processes were added to MPI, at least in part, as a political necessity since one of the more important parallel run-time systems prior to MPI included the ability to spawn new processes at run-time. Let's take a trip back in history to examine one of MPI's predecessors, the Parallel Virtual Machine (PVM)... {mosgoogle right}

PVM

PVM was a project out of the Oak Ridge National Laboratory (ORNL) in the early 90's and was one of the first truly portable parallel run-time systems. PVM allowed scientists and engineers to develop parallel codes on their workstations and then run them on "big iron" production machines. PVM became enormously popular and enjoyed a large, fervent user base.

Although PVM had the capability to launch multiple processes simultaneously and bind them together into a job, most users tended to prefer a different model for launching their parallel applications. They would simply launch a single, serial process (e.g., ./a.out) and use PVM's spawning capabilities to launch the rest of the processes required for the parallel application. Indeed, this became the de facto method of launching parallel applications in PVM.

MPI's design was strongly influenced by PVM (among others). However, for a variety of technical reasons, the ability to spawn new processes was left out of the initial MPI specification (MPI-1). Although the MPI-1 standard does not specify how to start a parallel job, most implementations launch a set of processes together using an implementation-dependent mechanism, frequently a command named mpirun. This set of processes comprises the MPI_COMM_WORLD communicator. The size and composition of MPI_COMM_WORLD is fixed upon initiation: no processes can be added to or removed from MPI_COMM_WORLD.

Sidebar: MPI-2

In 1994, the MPI Forum re-convened to add on to the MPI-1 standard. Several large topics were proposed for inclusion: parallel I/O, new language bindings, one-sided operations, and dynamic processes (to include spawning).

Although strong technical cases were not initially presented as to why dynamic processes needed to be included in the MPI-2 standard, it was seen as a political necessity to address the PVM community's concerns. In typical MPI fashion, the MPI-2 standard includes not only spawning, but a total of three different models for dynamic process management (three is better than one, right?).

Initial implementations of the MPI-2 dynamic process control models started appearing around 1997. First uses of it were pretty much a direct port of the PVM start-one-process-that-starts-all-the-rest model. These were mainly comprised of PVM users porting their applications to MPI in order to take advantage of low latency networks or utilize vendor-tuned MPI implementations. It was only within the last few years that more interesting uses have become possible through mature, thread-safe implementations of the MPI dynamic process models.

The PVM community scoffed at this aspect of MPI-1 - why should a parallel application be limited in the number of processes that it could have?

Even though the vast majority of PVM applications only used spawning capabilities to launch their initial job, and even though MPI implementations could support parallel applications as large as PVM (if not larger), this misconception on the part of many PVM users slowed the initial adoption of MPI. Ironically, the startup mechanism in MPI is simpler than PVM's launch-one-process-that-launches-all-the-rest model. Specifically, the typical PVM model requires that the user write the spawning code. MPI implementations' built-in mpirun commands (or equivalent) handled most of the same functionality.

These facts were lost in the Great MPI/PVM Religious Debates of the early- and mid-90's.

Admittedly, I'm presenting the MPI view of most of the arguments. But the fact remains that MPI was built upon the shoulders of PVM and used many of its good ideas (indeed, the PVM developers were on the MPI Forum). Spawning simply was [initially] not one of them. Three different dynamic process models were later added in the MPI-2 standard.

Spawning New Processes

The first model is, unsurprisingly, spawning new processes. Keep in mind, however, that MPI-2 was intended to be extensions to MPI-1 - not changes. So if the static model of a fixed MPI_COMM_WORLD remains, what does spawning new processes mean in MPI?

In short, it means launching another MPI_COMM_WORLD. Spawning is a collective action, meaning the processes in a communicator must unanimously decide to launch a new set of processes. That is, they all invoke the function MPI_COMM_SPAWN (or MPI_COMM_SPAWN_MULTIPLE) and instruct MPI to launch a new MPI job that has its own MPI_COMM_WORLD.

Listing 1: Sample spawn
1 int rank, err[4];
2 MPI_Comm children;
3 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
4 MPI_Comm_spawn("child", NULL, 4, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &children, err);
5 if (0 == rank) {
6     MPI_Send(&rank, 1, MPI_INT, 0, 0, children);
7 }

The code snippet in Listing 1 launches four copies of an executable named "child" collectively across the processes in the spawning job's MPI_COMM_WORLD. This action creates a new MPI job with its own MPI_COMMW_WORLD, containing four processes. That is, at the end of MPI_COMM_SPAWN, there will be two MPI_COMM_WORLD instances - one per job. Each will have ranks 0 through (number of processes - 1).

Communication is established between the two jobs through an intercommunicator - the children argument to MPI_COMM_SPAWN, above. An intercommunicator is similar to an intracommunicator (i.e., a "normal" communicator, such as MPI_COMM_WORLD) except that it contains two "groups." In this case, one group is the spawning process; the other group is the spawned process. When using intercommunicators, the peer argument of all communication calls is always expressed in terms of the other group. Hence, line 6 in Listing 1 is sending to rank 0 of the children's group.

Listing 2: Sample spawned child
 1 #include "mpi.h"
 2 int main(int argc, &argv) {
 3     int rank, msg;
 4     MPI_Comm parent;
 5     MPI_Init(&argc, &argv);
 6     MPI_Comm_get_parent(&parent);
 7     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 8     if (0 == rank) {
 9         MPI_Recv(&msg, 1, MPI_INT, 0, 0, parent, MPI_STATUS_IGNORE);
10     }

The newly-spawned application can call MPI_COMM_GET_PARENT to obtain this communicator (see Listing 2). Again, since communication with intercommunicators is expressed in terms of the other group, the peer argument given to MPI_RECV on line 9 in Listing 2 is rank 0 of the parent's group. Hence, it is receiving the message sent from the MPI_SEND on line 6 in Listing 1.

Some applications are flexible in that they may be run directly (e.g., via mpirun) or they may be spawned. If an application was spawned, a valid communicator will be returned from MPI_COMM_GET_PARENT. If it was not, MPI_COMM_NULL will be returned (i.e., there is no parent because it was not spawned).

The MPI_COMM_SPAWN_MULTIPLE function behaves the same as MPI_COMM_SPAWN, except that it allows launching an array of different executables and command line arguments in a single MPI_COMM_WORLD - a multiple process, multiple data (MPMD) style of launching.

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.