Parallel I/O

High-Level Libraries

The last approach to parallel I/O I’m going to mention is high-level libraries that you can use for storing data (read and write), These libraries take care of the parallel I/O “under the covers,” so to speak. The two options I’m going to mention are HDF5 and Parallel NetCDF.

Parallel HDF5

HDF5 is a file format that can be used to store large amounts of data in an organized fashion. HDF stands for “Hierarchical Data Format” and the “5” indicates the particular file format. HDF5 files are portable, so you can write an HDF5 file on one system and read it on another. Moreover, a number of languages have HDF5 APIs, including C, C++, Fortran, Python, Perl, Julia, Go, Matlab, R, Scilab, Octave, and Lua, to name a few.

Parallel HDF5 uses MPI-IO to handle I/O to the storage system. To get the best performance reading and writing to HDF5 files, you can tune various aspects of MPI-IO, as well as HDF5 parameters for the underlying filesystem.

Parallel NetCDF

Another portable file format is NetCDF. The current version 4 allows the use of the HDF5 file format. APIs for NetCDF include C, C++, Fortran, Python, Java, Perl, Matlab, Octave, and more.

As with HDF5, NetCDF has a parallel version, Parallel-NetCDF, which also uses MPI-IO. This version is based on NetCDF 3 and was developed by Argonne Labs. To implement parallel I/O with NetCDF 4, you need to use HDF5 capability and make sure HDF5 was built with MPI-IO.


If you have an application that handles I/O in a serial fashion and the I/O is a significant portion of your run time, you could benefit by modifying the application to perform parallel I/O. The fun part is deciding how you should do it.

I recommend you start very simply and with a small-ish number of cores. I would use the file-per-process approach in which each TP performs I/O to its own file. This solution is really only suitable for small numbers of TPs, but it is fairly simple to code; be sure to have unique file names for each TP. This approach places more burden on the pre-processing and post-processing tools, but the application itself will see better I/O performance.

The second approach I would take is to use a high-level library such Parallel HDF5. You can use MPI-IO underneath the library to get improved I/O performance, but it might require some tuning. The benefit of using a high0level library is that you get a common, portable format across platforms with some possible I/O performance improvement.

After using high-level libraries, I would say that using MPI-IO or confining I/O to one TP are your choices. Writing applications for MPI-IO can be difficult, but it also can reap the biggest I/O performance boost. Having one TP perform all of the I/O can be a little complicated as well, but it is a very common I/O pattern for parallel applications.

Don’t be afraid of jumping into parallel I/O with both feet, because you can get some really wonderful performance improvements.