The process with rank 0 reads the initial data and distributes it to the other three processes, according to the domain decomposition below:

ProcessorsPlot
Possible target distribution of data points across four processes.
Problem

What if the number of points can't be divided evenly?

Approach

Use MPI_Scatterv with arrays as follows…
(Assume the number of points, NPTS, is not evenly divisible by the number of processes NPROC)

! Fortran skeleton code
  NMIN = NPTS/NPROC
  NEXTRA = MOD(NPTS,NPROC)
  K = 0
  DO I = 0, NPROC-1
     IF (I .LT. NEXTRA) THEN
        SENDCOUNTS(I) = NMIN + 1
     ELSE
        SENDCOUNTS(I) = NMIN
     END IF
     DISPLS(I) = K
     K = K + SENDCOUNTS(I)
  END DO
  ! need to set recvcount also ...
  CALL MPI_SCATTERV( &
       SENDBUF, SENDCOUNTS, DISPLS, ...

The code here divides up the points among processors as evenly as possible. First, it figures out how many extra points there are by using the modulo function. Then it loops over the processes. Via the SENDCOUNTS array, it assigns one extra point to each process until all NEXTRA extra points have been accounted for. And while it’s doing this, it’s using the DISPLS array to keep track of the relative starting locations of the chunks. Finally, MPI_SCATTERV is called with the calculated SENDCOUNTS and DISPLS arrays in order to perform the desired data distribution.

Exercise

Start from one of the nearly-completed template codes, either copying from the templates on the next page or downloading the code template for your preferred language (scatterv.c or scatterv.f90), and fill in the missing lines in the two places indicated by three dots in the code fragments above.

There are a couple of things to notice about the template codes:

  1. Only the root process needs to allocate the memory to hold the initial data (SENDBUF) and execute the code that defines the data.
  2. All the processes execute in parallel the code defining the other arguments that are input to MPI_Scatterv. (Why?)
Run

On Stampede2 or Frontera, copy and paste the sample code into a command line editor, then compile and run it using an interactive session. The Stampede2 and Frontera CVW Topics explain these steps in more detail.

  • Compile using a command like those shown below:
    % mpif90 scatterv.f90 -o scatterv_f
    % mpicc scatterv.c -o scatterv_c
  • Start an interactive session using:
    % idev -N 1 -n4
  • Run the code using the ibrun MPI launcher wrapper. These programs were written to use exactly 4 processes. You would need to redefine the constants NPROC, NPTS and NRBUF in order to run this code on a different number of processes.
    % ibrun -np 4 scatterv_c
    % ibrun -np 4 scatterv_f
Output

Correct output will display 4 lines from the 4 tasks, each showing the MPI rank and the values of the 9 or 10 data points assigned to that rank.

Solution

Here are the missing lines:

Missing lines in scatterv.c
Missing lines in scatterv.f90
 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement