Inter-communicators
So far, when talking about collective and point-to-point communications, all communication takes place within a communicator. A communicator used in this way is known as an intra-communicator. This is not well suited for modularity or multi-scale applications. It can be useful to model a system in a more fluid fashion where communicators can be merged and split in a scalable way. Therefore, there are times when we want to link two intra-communicators by forming an inter-communicator.
int MPI_Intercomm_create(MPI_Comm local_comm, int local_leader,
MPI_Comm peer_comm, int remote_leader, int tag,
MPI_Comm *newintercomm)
The function requires some explanation. First, MPI_Intercomm_create is collective over the union of the two intra-communicators that it is joining. Each intra-communicator will have a leader process within the inter-communicator; these can be thought of like network gateways. In MPI, point-to-point communications are enabled between the leader processes. The local_leader is the rank of the leader in the local communicator, where as the remote_leader is the rank of the leader in the peer communicator, which is a communicator in which both leaders have membership.
Note that the two intra-communicators should have disjoint groups of processes; if not, it is highly likely a deadlock will occur in a communication. Also note that topologies do not work with inter-communicators. If this functionality is desired, then it may be time to merge the inter-communicator into an intra-communicator; this would not destroy the inter-communicator, but would simply create a new intra-communicator with the union of processes belonging to the two intra-communicators that compose the inter-communicator. The tag
argument can be used to distinguish between multiple calls of MPI_Intercomm_create and will not interfere with calls to other functions using tags.
int MPI_Intercomm_merge(MPI_Comm intercomm, int high,
MPI_Comm *newintracomm)
While the first and last arguments are self-explanatory, high
requires some explanation: if it is true (non-zero; high) in all the processes in one communicator and false (zero; low) in all the processes in another communicator, then the "low" group will have its ranks ordered numerically less than those of the "high" group in newintracomm
. If all processes in both intra-communicators are set with high=true, then the order of the union is arbitrary. Other combinations of high and low are undefined in the standard.
Both point-to-point and collective communications can be applied to intercommunicators. Recall that there are several process relationships in collective communications: all-to-one, one-to-all, all-to-all, and other (MPI_Scan would be the notable member of other). Of course, point-to-point qualifies as one-to-one. In each case, when the "one" process belongs to one of the two member intra-communicators in the inter-communicator, the "all" corresponds to all the processes in the other member intra-communicator. In one-to-one communication, the two processes belong to the two separate inter-communicators (of course, otherwise, it would be intra-communication). Perhaps unintuitively, MPI_Barrier is included as a one-to-all operation, where the one calling process in a sub-group waits for all other processes to enter the barrier call in the other sub-group. MPI_Scan and its relatives currently do not support inter-communication. There are some differences to be aware when doing collective calls with inter-communicators:
-
MPI_ROOT
should be specified as the rank argument in a one-to-all communication, if it is the "one". - All other processes in
MPI_ROOT
's intra-communicator should specifyMPI_PROC_NULL
. - All processes in the other group (the "all" group) should specify the rank of the
MPI_ROOT
process (the "one" process) relative to its intra-communicator.
These caveats will be illustrated in the exercise. Point-to-point communications are fairly straightforward once you know that the rank specified in the calls must be a remote-group rank.