Parallel I/O for HPC

Summary

HPC is all about high performance, so your first focus should be on I/O performance. In this article, I explored using HDF5 to write data to a common file. The first simple example had each MPI process write its own data to a particular dataset. Although fairly easy to accomplish, it has limitations because the data is broken into a fixed number of datasets.

The next examples illustrated how to have each MPI process write to a common dataset. The HDF Group has created four examples for Fortran and C that use hyperslabs. With these approaches, each MPI process writes to the appropriate hyperslab in the dataset. Behind the scenes, HDF5 is using MPI-IO for the parallel I/O. Performing I/O in this fashion allows you to keep a single dataset across all MPI process without locking yourself into a specific number of MPI processes.

HDF5 has a large number of wonderful features in addition to parallel performance, so it is definitely worth taking the time to experiment and understand what it can do to improve application scalability and performance.