Both TensorFlow/Keras and PyTorch can use the HDF5 library in order to store neural network models to disk. In Python, an HDF5 API is provided by the h5py module, which makes calls down to compiled HDF5 libraries. TACC systems, however, make use of Parallel HDF5 (phdf5), a configuration of the HDF5 libraries that supports open files across multiple parallel processes. While the Python h5py package will generally be installed as part of a pip-based package install of one of the deep learning packages, the underlying phdf5 library is managed through the TACC module system. Therefore, when using h5py-based tools on TACC systems, you will want to load the appropriate phdf5 module. The module load command on Frontera is module load phdf5, which loads the default version 1.10.4.

As always, you can always run module avail to see what module versions are currently available. In order to test whether you have loaded a compatible module, you can run the following command from the shell (after loading a python3 module), to test whether the h5py library is able to be imported:

If this command returns with no errors, then h5py and phdf5 are properly configured. If you see an error message about not being able to open a libhdf5 shared object file, then you will need to load the correct phdf5 module.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement