MPI-IO Advantages

Two common alternatives to parallel MPI-IO are:

  1. Rank 0 accesses a file; it gathers/scatters its data from/to other ranks.
  2. Each rank opens a separate file on local disk and does I/O to it independently.

These alternative I/O schemes are simple enough to code, but they respectively have

  1. Poor scalability (e.g, the single task is a bottleneck), and
  2. Challenges with file management (e.g., the files must be collected from local disk over multiple nodes).

MPI-IO is a convenient interface for enabling true parallel I/O on systems that support it. It provides

  • mechanisms for performing synchronization,
  • syntax for data movement, and
  • means for defining noncontiguous data layout in a file (MPI datatypes).

One big advantage of MPI-IO over Unix I/O is that the former has the ability to specify noncontiguous accesses in a file and related memory buffers. This is a common need in parallel applications where, for example, a distributed array may be stored in a single file, but in some rearranged order or layout. A sensible approach is therefore to

  • read or write such a file by using a derived datatype in an MPI-IO call, and
  • let the MPI implementation optimize the access.

Collective I/O combined with noncontiguous accesses generally yields the highest performance in MPI-IO.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement