Parallel I/O with MPI-IO

Parallel applications commonly need to write distributed arrays to disk. Given a parallel file system such as Lustre, it should be possible somehow to do a parallel write of a distributed array, yet have everything end up in a single file. Clearly, this cannot be done using standard file streams; a more sophisticated interface is needed. This is where MPI-IO comes in.

Why is parallel I/O part of MPI?

  • I/O was lacking from the MPI-1 specification
  • It was defined independently due to need, then subsumed into MPI-2

What is parallel I/O for MPI? It occurs when

  • multiple MPI tasks can read or write simultaneously,
  • from or to a single file,
  • in a parallel file system,
  • through the MPI-IO interface.

A parallel file system works by

  • appearing as a normal Unix file system, while
  • employing multiple I/O servers (usually) for high sustained throughput.

HPC parallel I/O requires some extra work, but it

  • potentially provides high throughput and
  • offers a single (unified) file for visualization and pre/post-processing.

It should be acknowledged that MPI-IO is a fairly low-level interface for doing parallel I/O. At the application level, it may be preferable to make use of a more abstract library that is built on top of MPI-IO, even though this constrains the file format that the application works with. A good example would be parallel HDF5. The interested reader is encouraged to look at (and try!) the HDF5 examples in the Parallel I/O Libraries roadmap.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement