MPI: Groups and Communicators

Article Index

Intercommunicators

Intercommunicators refers to communication between two groups in a single communicator - a local group and a remote group. When using an intercommunicator, the initiating process is defined to be in the local group and the target process is defined to be in the remote group. For example, a process invoking MPI_SEND on an intercommunicator is in the comm's local group, but the dest rank argument is relative to the remote group. Similarly, a process invoking MPI_RECV on an intercommunicator is in the comm's local group, but the src rank argument is relative to the remote group.

Intercommunicators are somewhat of an advanced topic; I'll return to them a few months from now.

Topologies

MPI also contains a full set of topology-based communicators utilizing on N-dimensional Cartesian shapes as well as arbitrary graphs. The rationale is that if the communicator (and therefore the MPI implementation) is aware of the underlying network topology, the user application can effectively "Send this message to the peer on my right," and the MPI application can both figure out who the peer "on my right" is as well as potentially optimize the message routing.

My own opinion is that topologies are the best kept secret in MPI. They are rarely utilized by real user applications for two reasons:

  • The setup function calls are somewhat bulky and inconvenient
  • Since no users use them, few MPI implementations have optimized them, and there is little performance gain realized

It's a vicious circle - no one uses them because they aren't optimized, and most MPI implementors don't optimize them because no one uses them.

Operations on Communicators

Now that you know what communicators are, what can you do with them?

Perhaps the most common two operations invoked on communicators are functions that have been mentioned in previous editions of this column: MPI_COMM_RANK and MPI_COMM_SIZE. The former returns the rank of the calling process in the communicator; the latter returns the total size of the local group in the communicator. These two functions are critical for identity-based algorithms.

MPI_COMM_DUP (mentioned in the "Communicators - what's the point?" sidebar) takes a single communicator as input and creates a new communicator as output. The new communicator has exactly the same process membership and order (i.e., the groups in the two communicators are MPI_IDENT), but they have different communication contexts. MPI_COMM_DUP is a collective call - all processes in the input communicator must call MPI_COMM_DUP before it will return.

Another common communicator operation is MPI_COMM_SPLIT (also a collective call). This operation takes an input communicator and splits it into sub-communicators:

int MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm);

Each calling process will provide the same input comm argument. Processes that use the same color will end up in a new communicator together (newcomm). The special color MPI_UNDEFINED can also be used, in which case the calling process will receive MPI_COMM_NULL in newcomm (i.e., it won't be part of any new communicators). The key argument is used for relative ordering in each process' respective newcomm.

MPI_COMM_SPLIT is useful to partition groups of processes into sub-communicators for specific purposes. Listing 1 shows a simple example:

Listing 1: Simple use of MPI_COMM_SPLIT
 1 int rank;
 2 MPI_Comm row, col;
 3 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 4 MPI_Comm_split(MPI_COMM_WORLD, rank / 4, 0, &row);
 5 MPI_Comm_split(MPI_COMM_WORLD, rank % 4, 0, &col);

Assuming that this program was invoked with 16 processes (i.e., a two-dimensional 4x4 grid), the calling process will end up being in two new communicators: row, which contains all processes in the same row as this process, and col, which contains all the processes in the same column as this process.

Where To Go From Here?

As with MPI collectives, MPI communicators are your friends. Use MPI_DUP to create safe communication contexts in different parts of your application, and use the subsetting capabilities of MPI_COMM_SPLIT instead of creating arbitrarily complicated grouping schemes with additional arrays, pointers, or lookup tables.

Next month: datatypes! We've talked about sending messages and shown simple examples of involving a single integer or character - real application need to send much more complex data.

{mosgoogle right}

Resources
MPI Forum (MPI-1 and MPI-2 specifications documents) http://www.mpi-forum.org
MPI - The Complete Reference: Volume 1, The MPI Core (2nd ed) (The MIT Press) By Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. ISBN 0-262-69215-5
MPI - The Complete Reference: Volume 2, The MPI Extensions (The MIT Press) By William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir. ISBN 0-262-57123-4.
NCSA MPI tutorial http://webct.ncsa.uiuc.edu:8900/public/MPI/

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux, you may wish to visit Linux Magazine.

Jeff Squyres is the Assistant Director for High Performance Comptuing for the Open Systems Laboratory at Indiana University and is the one of the lead technical architects of the Open MPI project.

    Search

    Login And Newsletter

    Create an account to access exclusive content, comment on articles, and receive our newsletters.

    Feedburner

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.