Improved Performance with Parallel I/O



Over time, MPI (Message Passing Interface) [2] became popular and researchers began thinking of how to handle parallel I/O for MPI applications better. In MPI-2, something called MPI-IO was adopted. MPI-IO is a set of functions that abstract I/O for MPI applications on distributed systems. It allows the application to perform I/O in parallel much the same way MPI sends and receives messages.

Typically each process in the MPI communicator participates in the I/O, but it's not required. How each process writes to a file is up to the developer. Although it is far beyond the scope of this article to discuss MPI-IO, a number of tutorials [3], documents [4], and even books [5] online can help you get started.

High-Level Libraries

The last approach to parallel I/O I'm going to mention is high-level libraries that you can use for storing data (read and write), These libraries take care of the parallel I/O "under the covers," so to speak. Two options are worth mentioning: HDF5 and Parallel NetCDF.

Parallel HDF5

HDF5 [6] is a file format that can be used to store large amounts of data in an organized fashion. HDF stands for "Hierarchical Data Format" and the "5" indicates the particular file format. HDF5 files are portable, so you can write an HDF5 file on one system and read it on another. Moreover, a number of languages have HDF5 APIs, including C, C++, Fortran, Python, Perl, Julia, Go, Matlab, R, Scilab, Octave, and Lua, to name a few.

Parallel HDF5 [7] uses MPI-IO to handle I/O to the storage system. To get the best performance reading and writing to HDF5 files, you can tune various aspects of MPI-IO, as well as HDF5 parameters for the underlying filesystem.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Improved Performance with Parallel I/O

    Understanding the I/O pattern of your application is the starting point for improving its I/O performance, especially if I/O is a fairly large part of your application’s run time.

  • Failure to Scale

    Your parallel application is running fine, but you want it to run faster. Naturally, you use more and more cores, and everything is great; however, suddenly performance starts decreasing. What just happened?

  • Why Good Applications Don't Scale
    You have parallelized your serial application, but as you use more cores you are not seeing any improvement in performance. What gives?
  • Why Good Applications Don’t Scale

    You  ha ve parallelized your serial application ,  but as you use more cores you are  n o t seeing any improvement  in performance . What gives?

  • HDF5 and Parallel I/O

    In the last of three articles on HDF5, we explore performing parallel I/O with the use of HDF5 files.

comments powered by Disqus