Collective I/O

A critical optimization in parallel I/O is to take advantage of collective operations that can read from, and write to, the parallel file system. Why? There are a number of advantages:

  • Allows file system to have "big picture" of overall data movement
  • Framework for two-phase I/O, in which communication precedes I/O
  • Preliminary communication can use MPI machinery to aggregate data
  • Basic idea: build large blocks so reads/writes will be more efficient
Collective I/O bundles small individual requests into larger operations
Collective I/O bundles small individual requests into larger operations.

Collective routines typically have names like MPI_File_read_all, MPI_File_read_at_all, etc. The _all suffix indicates that all ranks will be calling this function together, based on the communicator that was passed to MPI_File_open (i.e., all processes in the corresponding group). Each rank needs to provide nothing beyond its own access information; therefore, the argument list is the same as for the non-collective functions.

Collective I/O operations work with shared pointers, too, but one encounters a wrinkle in the nomenclature. The general rule is to replace _shared with _ordered in the name of the routine. Thus, the collective equivalent of MPI_File_read_shared is MPI_File_read_ordered. For both the _shared and _ordered routines, the implicitly-maintained shared file pointer will be used to determine offsets within the file.

©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement