Cornell Virtual Workshop > MPI Advanced Topics > Communicators and Groups

Inter-communicators

So far, when talking about collective and point-to-point communications, all communication takes place within a communicator. A communicator used in this way is known as an intra-communicator. This is not well suited for modularity or multi-scale applications. It can be useful to model a system in a more fluid fashion where communicators can be merged and split in a scalable way. Therefore, there are times when we want to link two intra-communicators by forming an inter-communicator.

int MPI_Intercomm_create(MPI_Comm local_comm, int local_leader,
     MPI_Comm peer_comm, int remote_leader, int tag,
     MPI_Comm *newintercomm)

The function requires some explanation. First, MPI_Intercomm_create is collective over the union of the two intra-communicators that it is joining. Each intra-communicator will have a leader process within the inter-communicator; these can be thought of like network gateways. In MPI, point-to-point communications are enabled between the leader processes. The local_leader is the rank of the leader in the local communicator, where as the remote_leader is the rank of the leader in the peer communicator, which is a communicator in which both leaders have membership.

Note that the two intra-communicators should have disjoint groups of processes; if not, it is highly likely a deadlock will occur in a communication. Also note that topologies do not work with inter-communicators. If this functionality is desired, then it may be time to merge the inter-communicator into an intra-communicator; this would not destroy the inter-communicator, but would simply create a new intra-communicator with the union of processes belonging to the two intra-communicators that compose the inter-communicator. The tag argument can be used to distinguish between multiple calls of MPI_Intercomm_create and will not interfere with calls to other functions using tags.

int MPI_Intercomm_merge(MPI_Comm intercomm, int high,
     MPI_Comm *newintracomm)

While the first and last arguments are self-explanatory, high requires some explanation: if it is true (non-zero; high) in all the processes in one communicator and false (zero; low) in all the processes in another communicator, then the "low" group will have its ranks ordered numerically less than those of the "high" group in newintracomm. If all processes in both intra-communicators are set with high=true, then the order of the union is arbitrary. Other combinations of high and low are undefined in the standard.

Both point-to-point and collective communications can be applied to intercommunicators. Recall that there are several process relationships in collective communications: all-to-one, one-to-all, all-to-all, and other (MPI_Scan would be the notable member of other). Of course, point-to-point qualifies as one-to-one. In each case, when the "one" process belongs to one of the two member intra-communicators in the inter-communicator, the "all" corresponds to all the processes in the other member intra-communicator. In one-to-one communication, the two processes belong to the two separate inter-communicators (of course, otherwise, it would be intra-communication). Perhaps unintuitively, MPI_Barrier is included as a one-to-all operation, where the one calling process in a sub-group waits for all other processes to enter the barrier call in the other sub-group. MPI_Scan and its relatives currently do not support inter-communication. There are some differences to be aware when doing collective calls with inter-communicators:

MPI_ROOT should be specified as the rank argument in a one-to-all communication, if it is the "one".
All other processes in MPI_ROOT's intra-communicator should specify MPI_PROC_NULL.
All processes in the other group (the "all" group) should specify the rank of the MPI_ROOT process (the "one" process) relative to its intra-communicator.

These caveats will be illustrated in the exercise. Point-to-point communications are fairly straightforward once you know that the rank specified in the calls must be a remote-group rank.

Back