Cornell Virtual Workshop > Parallel I/O Libraries > PHDF5

Dataset operations

Dataset operations are identified by the prefix H5D. A dataspace is a persistent property of a dataset; HDF5 dataspace operations are denoted by the H5S prefix.

Create / Open

hid_t H5Dcreate( hid_t f_id, const char *fname, hid_t d_id,
                 hid_t s_id, hid_t lcpl_id,
                 hid_t dcpl_id, hid_t dapl_id )

hid_t H5Dopen( hid_t f_id, const char *fname, hid_t dapl_id )

Read and Write

herr_t H5Dread( hid_t d_id, hid_t memtype_id,
                hid_t memspace_id, hid_t filespace_id,
                hid_t xplist_id, void *buf )

herr_t H5Dwrite( hid_t d_id, hid_t memtype_id,
                 hid_t memspace_id,  hid_t filespace_id,
                 hid_t xplist_id, const void *buf )

Hyperslabs

herr_t H5Sselect_hyperslab( hid_t s_id, H5S_seloper_t op,
                            const hsize_t *start,
                            const hsize_t *stride,
                            const hsize_t *count,
                            const hsize_t *block )

H5Sselect_hyperslab() is used to select a subset of the dataspace (hyperslab) to add to the currently selected region for the dataspace identified by s_id. The op parameter is used to perform a logical operation on the current selection such as H5S_SELECT_SET, H5S_SELECT_AND, and so on. The parameters start, stride, count, and block indicate the offset to the start of hyperslab, hyperslab stride, number of blocks, and block size respectively, in each dimension of the hyperslab. Essentially these parameters define the hyperslab selection. Note that start, stride, count, and block are all arrays, and they must have the same size as the dimension of the dataspace. If we consider a 3D dataspace, then the parameters should be 1D arrays of size 3, where each array element represents a value in the corresponding dimension.

A typical usage of a hyperslab in parallel is as follows:

Each process calls H5Sselect_hyperslab() to define memory and file hyperslabs.
Each process executes a read/write call using the hyperslabs defined. The type of MPI-IO call (independent or collective) is defined by the data transfer property list set prior to the read/write calls.
The hyperslab parameters define the portion of the dataset to be written.

The following figure illustrates how hyperslab-based I/O works using contiguous and chunked writes (or reads) in parallel. Note that hyperslabs can also be defined for writing into (or reading from) a file in parallel using different patterns, such as regularly spaced data and checkerboard patterns.

Each of n processes has a 3 by 6 unit chunk of data and writes to a designated portion of the file. The data in the file has n-times-3 rows and six columns. — How hyperslab-based I/O works using contiguous and chunked writes in parallel

In this example, n processes are writing contiguous hyperslabs into a file in parallel. The dataset dimension is dimx by dimy, or here, 12 by 6 (assuming n = 4). As the dataset is two-dimensional, each of the parameters start, stride, count, and block is an array of 2 elements. The following are the values of these parameters:

count[0] = dimx/4; count[1] = dimy;
start[0] = rank*count[0]; start[1] = 0;
stride[0] = 1; stride[1] = 1;  //It can be passed NULL (default)
block[0] = 1; block[1] = 1;   //It can be passed NULL (default)

The second figure illustrates a non-contiguous mapping of the dataspace to four processes.

Four processes write chunked hyperslabs each of dimension 3x3. The data in the file is arranged in a 6 by 6 grid, and each file writes to one quadrant of the grid. — Four processes write *chunked* hyperslabs each of dimension 3x3.

In this example, four processes write chunked hyperslabs, each of dimension 3x3. The following are the values of the parameters:

count[0] = 1; count[1] = 1;
switch(P) {
    case 0 : start[0] = 0; start[1] = 0; break;
    case 1 : start[0] = 0; start[1] = 3; break;
    case 2 : start[0] = 3; start[1] = 0; break;
    case 3 : start[0] = 3; start[1] = 3; break;
}
stride[0] = 1; stride[1] = 1;  // It can be passed NULL (default)
block[0] = 3; block[1] = 3;

Back