Previously we talked about using iostat to monitor local storage on your server or compute nodes, but what if you use NFS in your compute nodes to run jobs? The nfsiostat tool can help you understand the kinds of loads applications running on an NFS client put on the NFS storage server.

Monitoring Client NFS Storage with nfsiostat

In my last article, Monitoring Storage Devices with iostat, I wrote about using iostat to monitor the local storage devices in servers or compute nodes. The iostat tool is part of the sysstat family of tools that comes with virtually every Linux distribution, as well as other *nix-based operating systems (e.g., Solaris and AIX). Iostat can only tell you what is happening with your storage devices; it can't give you information about how the filesystem or virtual filesystem (VFS) is affecting application I/O; however, it does you give some insight into how your devices are behaving.

Many times a centralized storage solution is used for running applications, so using iostat to monitor I/O devices on the compute nodes doesn't tell me anything. What I need is a way to monitor NFS filesystem usage on the compute nodes. Another tool in the sysstat family, nfsiostat, allows me to monitor what is happening with NFS filesystems on my compute nodes (NFS clients). In this article, I want to go over what nfsiostat does and how I use it.

As you know, all good HPC admins use the command line, but I haven't found a good way to use it to display time histories of several parameters.Like iostat, nfsiostat is a command-line tool, so as part of the discussion, I will develop a simple Python tool that parses the output and creates an HTML document with plots of the results, thanks to matplotlib. The Python tool, nfsiostat_plotter, is very similar to the iostat_plotter tool I developed for the previous article. To illustrate how I use nfsiostat, I'll run IOzone on an NFS client node.

nfiostat

The output from nfsiostat can vary depending on the options you chose, but it doesn’t have as many options as iostat. The command I typically use is:

[laytonj@home8 ~]$ nfsiostat -h -m -t 1 4 > gg1.out

The output (wrapped) from this command is shown in Listing 1.

Listing 1: Sample nfsiostat Output

Linux 2.6.18-308.16.1.el5.centos.plus (home8)  02/10/2013  _i686_ (1 CPU)

02/10/2013 03:38:48 PM
Filesystem:               rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
192.168.1.250:/home
                         1230649.19   1843536.81         0.00         0.00   1229407.77   1843781.39 4661000.00 1238000.00 2342900.000

2/10/2013 03:38:49 PM
Filesystem:               rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
192.168.1.250:/home
                               0.00         0.27         0.00         0.00         0.00         0.14   2800.00      0.00   2800.000

2/10/2013 03:38:50 PM
Filesystem:               rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
192.168.1.250:/home
                               0.00         0.27         0.00         0.00         0.00         0.14   2800.00      0.00   2800.000

2/10/2013 03:38:51 PM
Filesystem:               rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
192.168.1.250:/home   
                               0.00         0.27         0.00         0.00         0.00         0.14   2800.00      0.00   2800.000

The command options are pretty straightforward: -h makes the output a little easier to read, -m produces output in megabytes per second (MB/s), and -t tells nfsiostat to print the time whenever report output is printed. The two numbers, 1 and 4, tell nfsiostat to print a report every second, four times, then I send the output to file gg1.out.

Some of the nfsiostat output is a little cryptic, but it’s actually pretty easy to follow and is very similar to iostat. After listing some system information on the first line, four NFS I/O lines report system information in several columns. The first line is the time (-t option) since thenfsiostat was executed (Table 1).

Table 1: nfsiostat Report Output

Column Definition
Filesystem Name of the NFS filesystem mounted. Typically this is the NFS server name followed by the mount point on the NFS client.
rBlk_nor/s (rkB_nor/s, rMB_nor) The number of blocks (kilobytes, megabytes) read by applications using the NFS mounted filesystem with the read(2) system call. A block has 512 bytes. In the example, I chose to use MB/s with the -m option.
wBlk_nor/s (wkB_nor/s, wMB_nor/s) The number of blocks (kilobytes, megabytes) written by applications using the NFS mounted filesystem with the write(2) system call. In the example, I chose to use MB/s with the -m option.
rBlk_dir/s (rkB_dir/s, rMB_dir/s) The number of blocks (kilobytes, megabytes) read from files opened with the O_DIRECT flag. In the example, I chose to use MB/s with the -m option.
wBlk_dir/s (wkB_dir/s, wMB_dir/s) The number of blocks (kilobytes, megabytes) written to files opened with the O_DIRECT flag. In the example, I chose to use MB/s with the -m option.
rBlk_svr/s (rkB_svr/s, rMB_svr/s) The number blocks (kilobytes, megabytes) read from the NFS server by the NFS client via an NFS READ request. In the example, I chose to use MB/s with the -m option.
wBlk_svr/s (wkB_svr/s, wMB_svr/s) The number blocks (kilobytes, megabytes) written to the NFS server by the NFS client via an NFS WRITE request. In the example, I chose to use MB/s with the -m option.
ops/s The number of operations per second issued to the filesystem.
rops/s The number of read operations per second issued to the filesystem.
wops/s The number of write operations per second issued to the filesystem.

As with iostat, the first report generated by nfsiostat provides statistics since the system was booted. All subsequent reports use the time interval you specify. Basically, you ignore the first line of data and watch the subsequent lines of data.

Although this first example shows you what the output looks like, it is uninteresting because I wasn't doing anything on the NFS-mounted filesystem. Before continuing, I want to present quickly the plotting tool for creating an nfsiostat HTML report.

Reporting nfsiostat Output with Plots and HTML

I like to examine the output from nfsiostat in the same manner I do iostat – visually. If I have applications running over an extended period, I would like to visualize the trend of output. However, this is virtually impossible to do at the command line; consequently, I decided to write a simple Python script that parses through the nfsiostat output and creates the plots I want to see. Taking things a little further, the script creates HTML so I can see the plots all in one document (and convert it either to PDF or Word).

The script, creatively called nfsiostat_plotter, uses a few Python modules: shlex, time, matplotlib, and os. As with iostat_plotter, I tested it with Matplotlib 0.91 and a few other versions to make sure it worked correctly within my ability to test a range of packages. I tested the script with a couple of versions of Python as well: 2.6.x and 2.7.x, but not 3.x. The script is offered up as is with no real guarantees.

I love feedback, including feedback that says the script isn’t Pythonic enough or efficient enough. I don't pretend to be an expert Python coder – I'm too busy solving problems and developing tools for myself to worry about that; hence, any well-intended comments are truly welcome because it gives me an opportunity to learn and improve.

The nfsiostat command I use is:

[laytonj@home8 NFSIOSTAT]$ nfsiostat -h -m -t [n m] > file.o

As with iostat_plotter, I did hard code some aspects into the script to look for specific output (e.g., the megabyte option [-m]), but that is easy enough to change if you so desire. The sampling rate (n) and number of samples (m) can be anything. Because nfsiostat will grab statistics from all NFS-mounted filesystems, you don't need to specify one as you do with iostat_plotter.

In the command above, the output file is file.out, but you can name it anything you like. I recommend using the & command after file.out to reduce the command’s priority and put it in the background.

Oncenfsiostathas gathered data, running thenfsiostat_plotterscript is simple:

[laytonj@home8 NFSIOSTAT]$ ./nfsiostat_plotter.py file.out

The script follows the same general guidelines as iostat_plotter and prints basic information about what it is doing as it runs. Listing 2 is an example.

Listing 2: Output from nfsiostat_plotter While It Is Running

[laytonj@home8 NFSIOSTAT]$ ./nfsiostat_plotter.py iozone_r_nfsiostat.out
nfsiostat plotting script

input file name:

iozone_r_nfsiostat.out
reading nfsiostat output file ...
Finished reading  660  data points for  1  fs.
Creating plots and HTML report
   Finished Plot  1  of  4
   Finished Plot  2  of  4
   Finished Plot  3  of  4
   Finished Plot  4  of  4
Finished. Please open the document HTML/report.html in a browser.

Notice that the script is capable of plotting data for multiple NFS mount points because it mentions that it read "660 data points for 1 fs," where fs means filesystem.

The script creates a subdirectory, HTML_REPORT, in the current working directory and puts all of the plots, along with a file named report.html, there. Once the script is done, just open report.html in a browser or a word processor. The HTML is simple, so any browser should work (if it does barf on the HTML, then you have a seriously broken browser).

Note that all plots use time as the x-axis data. This data is also normalized by the start time of the nfsiostat run; therefore, the x-axis for all plots start with 0. If you like, you can change this in the script. Additionally, because the first set of values are nonsensical, except for the time, the plots don't use the first data point.

Example

To illustrate what nfsiostat can do, I ran it while I was doing a read/reread test with IOzone. The set of commands I used was:

date
nfsiostat -h -m -t 1 660 > iozone_r_nfsiostat.out
sleep 10
./iozone -i 1 -r 64k -s -3g -f iozone.tmp > io.out
sleep 10
date

I put the two sleep commands in the script, so I could get some “quiet” data before and after the run. (Note: Sometimes it’s good to look at what the device is doing for some time before and after the critical time period to better understand the state of the device(s) prior to the time of interest.) Before the second sleep, I created the file iozone.tmp in an iozone “write/rewrite” test. The output from nfsiostat is sent to the file iozone_r_nfsiostat.out.

The hardware you use can affect the observations or conclusions you draw. In this case, my test hardware is a bit on the older side. The NFS server is a quad-core AMD system with the following specifications:

  • Scientific Linux 6.2 (SL 6.2)
  • 2.6.32-220.4.1.el6.x86_64 kernel
  • GigaByte MAA78GM-US2H motherboard
  • AMD Phenom II X4 920 CPU (four cores)
  • 8GB of memory (DDR2-800)
  • The OS and boot drive are on an IBM DTLA-307020 (20GB drive at Ultra ATA/100)
  • /home is on a Seagate ST1360827AS
  • ext4 filesystem with default options
  • eth0 is connected to the LAN and has an IP address of 192.168.1.250 (Fast Ethernet)

The NFS server exports /home using NFSv3 to the compute node through a simple eight-port Fast Ethernet switch. The NFS client node has the following specifications:

  • CentOS 5.8 (SL 5.8)
  • 2.6.18-308.16.1.el5.centos.plus kernel
  • AMD Athlon processor (one core) at 1.7GHz
  • 1GB of memory
  • Fast Ethernet NIC
  • eth0 is connected to the LAN
  • NFS mount options: vers=3,defaults"

Once the IOzone test was finished, I ran nfsiostat_plotter.py against thenfsiostat output file. The generated HTML report is listed below.

*********** Begin Report ***********

Introduction

This report plots the nfsiostat output contained in file: iozone_r_nfsiostat.out. The filesystems analyzed are:

  • 192.168.1.250:/home

For each filesystem there are a series of plots of the output from nfsiostat that was captured. The report is contained in a subdirectory HTML_REPORT. In that directory you will find a file name report.html. Just open that file in a browser and you will see the plots. Please note that all plots are referenced to the beginning time of the nfsiostat run.

NFSiostat outputs a number of basic system parameters when it creates the output. These parameters are listed below.

  • System Name: home8
  • OS: Linux
  • Kernel: 2.6.18-308.16.1.el5.centos.plus
  • Number of Cores 1
  • Core Type _i686_

The nfsiostat run was started on 02/10/2013 at 15:54:25 PM.

Below are hyperlinks to various plots within the report for each filesystem. 


192.168.1.250:/home:

 1. Application Read and Write Throughput

 2. Application Read and Write Throughput with O_DIRECT

 3. Application Read and Write using NFS_READ and NFS_WRITE

 4. Application Operations/s, Read ops/s, and Write Ops/s

1. Application Read and Write Throughput. Filesystem: 192.168.1.250:/home

This figure plots the read and write throughput from applications using the read(2) and write(2) system call interfaces. The throughput is plotted as a function of time.

Figure 1 - App Read and Write Throughput for FileSystem: 192.168.1.250:/home

2. Application Read and Write Throughput with O_DIRECT. Filesystem: 192.168.1.250:/home

This figure plots the read and write throughput in MB by the applications using the O_DIRECT flag.

Figure 2 - App Read and Write Throughput with O_DIRECT for FileSystem: 192.168.1.250:/home

3. Application Read and Write using NFS_READ and NFS_WRITE. Filesystem: 192.168.1.250:/home

This figure plots the amount of read and write in MB by the applications using NFS_READ and NFS_WRITE.

Figure 3 - App Read and Write throughput using NFS READ and NFS WRITE for FileSystem: 192.168.1.250:/home

4. Application Operations/s, Read ops/s, and Write Ops/s. Filesystem: 192.168.1.250:/home

This figure plots the overall ops/s, read ops/, and write ops/s.

Figure 4 - App Operations/s, Read ops/s, and Write ops/s for FileSystem: 192.168.1.250:/home

*********** End Report ***********

To understand what nfsiostat is communicating, I’ll walk you through the report.

1. Application Read and Write Throughput

Figure 1 plots the read and write throughput (MB/s) versus time for the run (starting at 0). The read and write operations are accomplished through the standard read(2) and write(2) functions. You can see the details with the command:

man 2 write

or

man 2 read

Notice in the top panel that read throughput peaks at about 8,000 MBps. Normally, I would jump up and down because this is about 8GBps, but a single disk is not capable of pushing 8GBps over a Fast Ethernet network to a client using NFS. What you are seeing are buffer effects, so performance does not reflect a direct read from the disk and then a send to the NFS client. Rather, the data is likely to be buffered on the NFS server and the NFS client. Also, you can see that most of the reads are around 1,000MBps (1GBps), which is also a sign of buffering because a single SATA 7,2000rpm disk is only capable of approximately 100MBps (plus or minus).

The bottom panel of Figure 1 is a plot of write throughput. Notice that it is very small relative to the read throughput, indicating that the application is very heavily read oriented, which makes sense for a read/reread IOzone test.

One way to check whether I/O is buffered is by switching all read and write I/O operations to a file that has been opened with O_DIRECT (see below). Combining O_DIRECT with O_SYNC forces the data to get at least to the drive buffers. Note that this is likely to be possible only if you have access to the application source code.

2. Application Read and Write Throughput with O_DIRECT

Figure 2 is a plot of the read and write I/O rates to the NFS-mounted filesystems for I/O that uses the O_DIRECT flag. This flag is used with the open() function to tell the OS to try to minimize the cache effects of the I/O to and from the file that is opened. When this option is used, all file I/O is done directly to and from userspace buffers (i.e., no OS buffers, no file system buffers, etc.). The I/O is synchronous, so when the read() or write() function returns, the data is guaranteed to be transferred (at least to the drive’s cache). As you can see in Figure 2, no I/O in this job used O_DIRECT.

3. Application Read and Write Using NFS_READ and NFS_WRITE

Figure 3 plots all NFS read and write operations – not just those using the read(2) or write(2) functions. Because no I/O occurs via O_DIRECT – or others sources I’m aware of – Figures 3 and 1 should be the same, and, in fact, they are. I personally like to compare Figures 3 and 1 to determine whether the application(s) are doing any I/O of which I’m not aware.

4. Application Operations/s, Read Ops/s, and Write Ops/s

Figure 4 plots NFS filesystem I/O operations (IOPS). The top panel in the figure plots overall I/O operations per second that were issued to the NFS-mounted filesystem. The middle panel plots the read I/O operations per second (Read IOPS). The bottom figure plots the write I/O operations per second (Write IOPS). The interesting thing about this plot for this example is the number of Write IOPS in the bottom panel.

The IOzone run was a read/reread test, so write operations should be minimal. However, some spikes in Write IOPS reach about 1,500 during the run and about 2,000 just after the run is finished. If you combine this information with the write throughput in Figure 3, you can see that write operations are extremely small because write throughput is very low (i.e., high Write IOPS, low write throughput, equals very small write operations).

On the other hand, the middle panel in Figure 4 shows that Read IOPS has considerably larger values than the Write IOPS, shown in the bottom panel. The Read IOPS peak at about 8,000, with steady-state values of around 1,000, which is much larger than the Write IOPS steady state of less than 100, for the most part.

The top panel is basically the sum of the read and write IOPS, along with other I/O operations. You can see that the overall IOPS (or total IOPS, if you will), peak close to 9,500 or so at around 30 seconds into the run.

Interestingly, the IOPS exceed what a single disk can usually accomplish. Again, you’re seeing the effect of buffers on IOPS performance. This isn't necessarily a bad thing, but you just have to keep in mind the hardware you are running and what estimates you might have for performance, so you can compare these estimates to reality and look for, and understand, any differences between the two.

Observations and Comments

The popular nfsiostat tool is part of the sysstat tool set that comes with virtually all Linux distributions, as well as other distributions. Although not perfect, it gives you an idea of how NFS is performing on the NFS clients; however, remember that this tool measures the effect of all applications using the NFS filesystem so it could be difficult to weed out the effect of individual applications.

It is very difficult, if not impossible, to examine nfsiostat command-line output and determine the various trends for different measurements. As with iostat, I decided to write a simple Python script, nfsiostat_plotter, that parses the output from nfsiostat and creates a simple set of plots presented in an HTML report. The example I used here was a read/reread IOzone test on an NFS client.

As I said in the iostat article, with no uber-tool at your disposal, you should understand the limitations and goals of each tool that is available. Nfsiostat can give you some idea of how applications on an NFS client are using the NFS filesystem, but I personally want a deeper insight into what is happening. I would like to know how the network is affecting performance. I would like to know how the devices on the NFS server are behaving. I would like to know more about how the filesystem on the server is affecting behavior (i.e., how the I/O scheduler affects performance and how the buffers affect performance).