MPI-IO Advantages
Two common alternatives to parallel MPI-IO are:
- Rank 0 accesses a file; it gathers/scatters its data from/to other ranks.
- Each rank opens a separate file on local disk and does I/O to it independently.
These alternative I/O schemes are simple enough to code, but they respectively have
- Poor scalability (e.g, the single task is a bottleneck), and
- Challenges with file management (e.g., the files must be collected from local disk over multiple nodes).
MPI-IO is a convenient interface for enabling true parallel I/O on systems that support it. It provides
- mechanisms for performing synchronization,
- syntax for data movement, and
- means for defining noncontiguous data layout in a file (MPI datatypes).
One big advantage of MPI-IO over Unix I/O is that the former has the ability to specify noncontiguous accesses in a file and related memory buffers. This is a common need in parallel applications where, for example, a distributed array may be stored in a single file, but in some rearranged order or layout. A sensible approach is therefore to
- read or write such a file by using a derived datatype in an MPI-IO call, and
- let the MPI implementation optimize the access.
Collective I/O combined with noncontiguous accesses generally yields the highest performance in MPI-IO.