Structured Data
Manu Shantharam, SDSC (Original), Steve Lantz (Updates)
Cornell Center for Advanced Computing
Revisions: 11/2022, 12/2020, 7/2017 (original)
Many scientific applications work with structured data. HDF5 and netCDF are two well-known file formats for storing numerical data in self-describing fashion. Historically, libraries already existed to do I/O to files in these specialized formats, and as scientific applications began to incorporate parallel computation, there was a need for high level I/O libraries with parallel I/O capabilities that could interact with these established formats. This topic describes these established formats and how high level libraries are incorporated into the parallel software stack.
Objectives
After you complete this topic, you should be able to:
- Summarize the motivation for using I/O libraries like netCDF and HDF5, as well as their parallel counterparts PnetCDF and PHDF5.
- Illustrate a typical parallel I/O software stack
Prerequisites
To complete this topic, you will need basic familiarity with parallel I/O concepts.