Cornell Virtual Workshop > Parallel I/O > MPI-IO

Parallel I/O with MPI-IO

Parallel applications commonly need to write distributed arrays to disk. Given a parallel file system such as Lustre, it should be possible somehow to do a parallel write of a distributed array, yet have everything end up in a single file. Clearly, this cannot be done using standard file streams; a more sophisticated interface is needed. This is where MPI-IO comes in.

Why is parallel I/O part of MPI?

I/O was lacking from the MPI-1 specification
It was defined independently due to need, then subsumed into MPI-2

What is parallel I/O for MPI? It occurs when

multiple MPI tasks can read or write simultaneously,
from or to a single file,
in a parallel file system,
through the MPI-IO interface.

A parallel file system works by

appearing as a normal Unix file system, while
employing multiple I/O servers (usually) for high sustained throughput.

HPC parallel I/O requires some extra work, but it

potentially provides high throughput and
offers a single (unified) file for visualization and pre/post-processing.

It should be acknowledged that MPI-IO is a fairly low-level interface for doing parallel I/O. At the application level, it may be preferable to make use of a more abstract library that is built on top of MPI-IO, even though this constrains the file format that the application works with. A good example would be parallel HDF5. The interested reader is encouraged to look at (and try!) the HDF5 examples in the Parallel I/O Libraries roadmap.

Back