MPI-IO
Steve Lantz
Cornell Center for Advanced Computing
Revisions: 10/2022, 5/2017, 6/2015, 2/2014, 7/2012 (original)
Acknowledgments: MPI-IO materials are based on a presentation by Bill Barth at TACC
Parallel applications often distribute in-memory data across nodes to share a workload across a set of workers. Similarly, parallel file systems, like Lustre, distribute on-disk data across multiple disks to improve performance. What if you want to write distributed data to a single distributed file in parallel? Standard file streams would need to collect the data back to a single node before re-distributing the data across disks, creating a bottleneck. Fortunately, MPI-IO provides an interface for writing distributed data to a distributed file in parallel.
Objectives
After you complete this topic, you should be able to:
- Explain parallel I/O in the context of MPI
- List two common alternatives to parallel MPI-IO
- Describe advantages of MPI-IO over other strategies
- Explain file pointers and offsets allow multiple writers to work on one file
- Explain how file views enable more complicated data access patterns
- List the advantages of collective operations in parallel I/O
- Explain how asynchronous operations help the system to optimize I/O
- List the three orthogonal aspects to MPI-IO data access
Prerequisites
The Parallel I/O roadmap assumes that the reader has basic knowledge of Linux shell commands, parallel programming, and MPI. Coverage of these prerequisites can be found in the Shells topic plus the roadmaps on Parallel Programming Concepts and High-Performance Computing and MPI Basics.
Programming experience in C or Fortran is also recommended. Introductory roadmaps on C and Fortran are available, though the reader will need to look elsewhere for a full tutorial on these languages.
In sequence, the current roadmap logically follows the MPI Advanced Topics roadmap, but the latter is not a prerequisite.