Parallel I/O

Better Parallel I/O

A better approach for handling parallel I/O is shown in Figure 5. In this approach, every TP writes to the same file, but to different “section” of it. Because the sections are contiguous, you have no chance of one TP overwriting the data from a neighboring TP. For this approach to work, you need a common shared filesystem that all TPs can access (NFS anyone?).

Figure 5: Multiple threads/processes performing I/O to a single file.

One of the challenges of this approach is that the data from each TP has to have it’s own “section” of the file. One TP cannot cross over into the section of another TP (don’t cross the streams), or you might end up with data corruption. The moral is, be sure you know what you are doing or you will corrupt the data file.

Also note that, most likely, if you write data using N number of TPs, you will have to keep using that many TPs for any applications later in the workflow. For some problems, this setup might not be convenient or even possible.

Developers of several applications have taken a different approach: using several TPs to process the I/O. In this case, each TP writes a certain part of the output file. Typically the number of I/O TPs is constant, which helps any pre-processing or post-processing applications in the workflow.

One problem with the single I/O TP or fixed number of I/O TPs approaches is that reading or writing data from a specific section often is not easy to accomplish; consequently, a solution has been sought that allows each TP to read/write data from anywhere in the file, hopefully without stepping on each others’ toes.

MPI-I/O

Over time, MPI (Message Passing Interface) became popular and researchers began thinking of how to handle parallel I/O for MPI applications better. In MPI-2, something called MPI-IO was adopted. MPI-IO is a set of functions that abstract I/O for MPI applications on distributed systems. It allows the application to perform I/O in parallel much the same way MPI sends and receives messages.

Typically each process in the MPI communicator participates in the I/O, but it’s not required. How each process writes to a file is up to the developer. Although it is far beyond the scope of this article to discuss MPI-IO, a number of of tutorials, documents, and even books online can help you get started.