Remora – Resource Monitoring for Users

Example 1

Remora is very, very simple to use: Just add remora before your usual command. For example, a simple command line for the application ./myapp.exe would become:

$ remora ./myapp.exe

In the case of MPI code, the command mpirun [...] ./mpiapp.exe

would become:

$ remora mpirun [...] ./mpiapp.exe

Notice that both commands are run as a user – elevated or root privileges are not required, which goes back to Remora’s design of focusing on and providing users with useful information.

The code in this article is for a simple serial Poisson solver for a rectangular grid. Although I used it in the past, the link to it no longer works. If you really want something equivalent, then use the OpenMP version of the code and just set OMP_NUM_THREADS to 1. You won't get the exact same timings as the old serial code, but it is probably a reasonable substitute.

To get a fairly long run time, I adjusted a few of the application parameters

nx = 8000
ny = 8000
t_max = 10000
tolerance = 0.00004D+00

and compiled the code with GFortran on a CentOS 7.8 system.

Remora first creates a subdirectory that contains the system information as a function of time. For this particular test, that subdirectory is remora_1605743629. This directory then has several subdirectories with the raw data. You can parse through the data, or Remora creates a web page (HTML) that plots the data for you, which is the easiest way to get a quick glimpse of what happened during application execution. Just open the web page in your favorite browser (Figure 1).

Figure 1: The web page showing the Remora output for the Example 1 code.

The HTML summary page lists the system metrics that Remora is capable of monitoring. A link below the metric means the corresponding data is available. Notice that for this simple case, only some of the metrics have been monitored.

The images are created by Google Charts. To include them in this article, they have been screen captured. Figure 2 is a plot of CPU usage versus time for Example 1, which is a serial application, so only one core was used. Notice how the kernel moves the application from one core to another. Remora itself uses little CPU time.

Figure 2: Example 1 CPU utilization plot.

The second plot of memory usage during the application run is shown in Figure 3. These memory metrics are gathered from /proc/<pid>/status and /dev/shm.

Figure 3: Example 1 memory utilization plot.

The memory stats include:

TMEM (Max): Total free memory. Considers the memory not being used by the application, the libraries needed by the application, and the OS.
SHMEM: Shared memory (/dev/shm). Applications have access to shared memory by means of /dev/shm. Any file put there counts toward the memory used by the application.
RMEM: Resident memory. Physical memory used by the application.
RMEM (Max): Maximum resident memory.
VMEM: Virtual memory. This information is important for watching to see if the OOM killer kicks in.
VMEM (Max): Maximum virtual memory.

Finally, Figure 4 is a plot of Ethernet usage during the run.

Figure 4: Example 1 Ethernet utilization plot.

Remora uses SSH to gather stats because the application can use MPI; otherwise, the usage is just regular network traffic.

A few environment variables can be used with Remora to control its behavior:

REMORA_PERIOD: How often statistics are collected. The default is 10 seconds. Integer values are acceptable.
REMORA_VERBOSE: If set to 1, this variable tells Remora to send all information to a file. The default is 0 (off).
REMORA_MODE: Which stats are collected. Possible values include:
- FULL (default): CPU, memory, network, Lustre.
- BASIC: CPU, memory.
REMORA_PLOT_RESULTS: Controls whether the results are plotted:
- 1 (default): generates HTML files.
- 0: generates plots only if the postprocessing tool (remora_post) is involved.
REMORA_CUDA: If set to 0, turns off GPU memory collection when a GPU module is available on the system.

You can set these variables as you like or need. Setting REMORA_PERIOD to 1 second could produce a fair amount of data if the application runs over a long period. A period that is too short could also affect application performance.

REMORA_VERBOSE is a good flag to set if you want to learn more about the code and understand how it measures resource usage. Of course, its obvious use is for when you are having issues with Remora.

I never use REMORA_MODE because I can control what I want measured by changing the module config file, as previously mentioned. The same is true for REMORA_PLOT_RESULTS, because I always want to see the plots.

Example 2

Example 2 is basically the same as Example 1, but it uses the OpenMP version of the Poisson solver. In this case, all of the cores in the system are used.

Notice in Figure 5 that the application execution time with OpenMP is much shorter when using four cores than when using one core. Figure 6 shows CPU usage versus time for Example 2. Remember, this is an OpenMP application running on all cores, which is reflected by 96%-100% core utilization during the entire run.