Here is an example showing how communicators can help to prevent message-passing mixups. Suppose we have written a program with three processes communicating with each other using MPI calls. Suppose that within the program, each process calls a library routine, and this library is also parallelized with MPI. Ideally, we would not want any intermingling of the messages; however, we haven't created two different message "spaces": one for our messages and one for the library's messages.

The program is intended to proceed according to the sequence below (and pictured in the diagram). In this case, we're fortunate that everything works as intended. The label blue refers to actions directly initiated by the user's code and the label green refers to actions initiated by a parallel library called by the user's code.

  1. There are three processes, labeled 0, 1, and 2. Initially, Process 0 and Process 1 each post a blue receive that is open to messages from any source.
  2. Process 2 sends a blue message to Process 1, satisfying the blocking receive on Process 1.
  3. Process 2 sends a blue message to Process 0, satisfying the blocking receive on Process 0.
  4. Process 1 sends a green message to Process 0. This blocks because Process 0 hasn't called receive yet.
  5. Process 0 posts a green receive for messages from Process 1 so the green send from Process 1 at step 4 completes. Process 1 posts a green receive for messages from Process 2. Process 2 posts a green receive for messages from Process 0.
  6. Process 0 posts a green send to Process 2. Process 2 has been waiting for the message since step 5, so the message passing completes.
  7. Process 2 posts a green send to Process 1, which has been waiting since time 5. The message passing completes.
Illustration of desired behavior as described in text.
Desired Behavior (Based on an image by David Walker, Oak Ridge Nat. Lab).

The three columns in the first diagram represent the three parallel processes happening simultaneously. In the illustration, time progresses from top to bottom. The blue unshaded boxes represent the user's code executed in time, and the green shaded boxes represent execution within the parallel library being called by the code. The gold-edged boxes represent MPI communication calls, which are described using an abbreviated notation. The usual MPI parameters are reduced to just a single number, which is the rank of the destination or source as appropriate. For example, send(1) means send a message to process 1, and recv(any) means receive a message from any process. Finally, the black arrows show the movement of messages from sender to receiver.

In the example above, matching sends and receives were always either both initiated by user code or both initiated by library code. However, there is no guarantee that things will occur in the order above if there is any variation in execution time. For example, changing the inputs to a single process could cause different behavior within the code, or even the relative scheduling of processes by the OS might vary slightly from run to run. Suppose we alter the timeline by adding some computation at the beginning of the third process. The sequence of events might then play out as follows:

Again, the label blue refers to actions directly initiated by the user's code and the term green refers to actions initiated by a parallel library called by the user's code.

  1. There are three processes, labeled 0, 1, and 2. Initially, Process 0 and Process 1 each post a blue receive that is open to messages from any source.
  2. Process 2 Sends a blue message to Process 1, satisfying the blocking receive on Process 1.
  3. Process 2 is busy computing, so when Process 1 send a green message to Process 0, it is matched with the blue receive.
  4. Process 2 posts a blue send to Process 0. The intended blue receive on Process 0 was already paired with a different message, so Process 2 is blocked. Meanwhile, Process 0 posts a green receive for a message from Process 1. The intended green message from Process 1 was already paired with a blue receive at step 3, so Process 0 is blocked, waiting for a message that is already consumed. Process 1 posts a green receive for a message from Process 2, so it is blocked.
  5. All the processes are blocked, waiting on messages that will not arrive.
Broken example as described in text.
Unintended Behavior (Based on an image by David Walker, Oak Ridge Nat. Lab).

In this case, the communications definitely do not take place as intended! The first "receive" in process 0 now receives the "send" from the library routine in process 1, not the intended (and now delayed) "send" from process 2. As a result, all three processes hang.

This problem is best avoided by the library developer requesting a new and unique communicator, and specifying this communicator in all send and receive calls made by the library. This creates a library (callee) message space separate from the user's (caller) message space.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement