Stat-like command-line tools for admins

mpstat

One more tool I want to mention is not exactly along the lines of vmstat or dstat, but it has helped me when I'm helping a user with an application, particularly one that uses more than one core. The tool is part of Systat [11], and is called mpstat [12]. Mpstat is more like iostat or nfsiostat, and it gives perhaps more and better information about the CPUs than vmstat or dstat. Because virtually all systems today have more than one core, mpstat can be very useful in tracking CPU usage. I use it when I'm writing OpenMP [12] code and want to see how much of the core capability I'm using. Figure 6 shows the output from

mpstat 1 10
Figure 6: Mpstat output for all processors combined while running Python code.

which outputs CPU statistics for 10 seconds at one-second intervals. The output first displays the system and the number of CPUs (eight, in this case), then starts printing the stats for all of the CPUs combined. The output is fairly similar to that of vmstat and dstat (Table 3).

Table 3: Mpstat Output

Column Output
CPU Processor number to which the output refers. In Figure 6, it refers to all processors combined.
%usr Percent CPU utilization by user applications.
%nice Percent CPU utilization by user applications using the "nice" priority.
%sys Percent CPU utilization at the system level (kernel). This does not include the time for servicing hardware or software interrupts.
%iowait Percent CPU spent idle during which the system had an outstanding I/O request.
%irq Percent CPU utilization spent servicing hardware interrupts.
%soft Percent CPU utilization spent servicing software interrupts.
%steal Percent CPU utilization spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
%guest Percent CPU utilization spent by the CPU or CPUs to run a virtual processor.
%gnice Percent CPU utilization spent by the CPU or CPUs to run a guest with a "nice" priority.
%idle Percent CPU utilization spent idle while the system did not have an outstanding I/O request.

The user applications are using a little over 13% of the total CPU capability, and the system is using a little less than 0.25%. The remaining roughly 86+% is idle (close to seven of eight cores). To see the details for each processor, use:

mpstat -P ALL

to show the same stats as in Figure 6, but for all of the CPU's combined and then for each core (Figure 7). I use this option to see if my programs using OpenMP code are using much of the CPU, which I hope reaches maximum core utilization.

Figure 7: Mpstat output for each processor while running Python code.

Summary

Sometimes you only have simple shell or crash cart access to a wayward node, so you can't run X. That means you have to rely on simple ASCII tools to help debug the problems.

I have found vmstat invaluable for diagnosing misbehaving nodes or checking or profiling user applications. For example, often when a user's application starts running slower, it's simply a matter of the user's application swapping on the compute nodes. A quick vmstat **1 **10 lets me see the problem and address it quickly. However, sometimes other issues require that you diagnose a node, and vmstat can't help. Fortunately, other people have run into the same problem and dstat was created. I tend to use dstat to get more information than I can get with vmstat.

Another tool I use, but not really to debug nodes, is mpstat, which helps me understand what my code is doing on a node. For example, if I'm writing OpenMP code, I can use mpstat to see if the cores are being used by user applications and, if so, how much. Moreover, you can use mpstat to diagnose performance problems that answer the question, "Why isn't my code running faster?"