Log Everything

Remote Logging with rsyslog

Another popular option is rsyslog (remote syslog), an open source tool for forwarding log messages to a central server using an IP network. It is very configurable using the /etc/rsyslog.conf file and the files in the /etc/rsyslog.d/ directory to define the various configuration options. Because the tool is so configurable and flexible, be sure to read the man pages very carefully.

You can get started fairly easily with rsyslog by using the defaults. On the remote host that collects the logs, you begin by editing the /etc/rsyslog.conf file, uncommenting the following lines:

$ModLoad imtcp
$InputTCPServerRun 514

These lines tell rsyslog to use TCP, which is port 514 by default. After the change, you should restart the rsyslog server.

On every node that is to send its logs to the logging node, you need to make some changes. First, in the file /etc/rsyslog.d/loghost.conf, make sure you have a line such as

*.* @@<loghost>:514 

where <loghost> is the name of the logging host (use either the IP address or resolvable hostname), the *.* refers to all logging facilities and priorities, the @@ portion tells rsyslog to use TCP for log transfer (an @ alone would tell it to use UDP), and 514 is the TCP port. After this change is made, restart the service on the node and every node that is to send the logs to the logging server.

In the logfiles on the logging server, the hostname of the node will appear, so you can differentiate logs on the basis of hostnames.

You can use either of these approaches, or one that you create, to store all of the system logs in a central location (a logging server). Linux comes with standard logs that can be very useful; alternatively, you might want to think about creating your own logs. In either case, you can log whatever information you feel is needed. The next few sections present some options you might want to consider.

CPU Logs

Many tools for measuring CPU usage are in the /proc filesystem. They range from uptime to top to sysstat to /proc/uptime. Which you use is really up to you; however, you should pick one method and stick with it.

To start, you might consider using uptime and send the result to the system log with logger. Uptime will give you the load averages for the entire node for the past 1, 5, and 15 minutes, so if you have 16 cores, you could see a load average of 16.0. Keep this in mind as you process the logs.

If you have a heterogeneous cluster with different types of nodes, your number of cores could differ. You have to take this into account when you process the logs.

A simple example in using uptime for CPU statistics is to create a simple cron job that runs the utility at some time interval and writes the output with logger. You could write this to the system log (syslog), which is the default, or you could write it to a different system log or a log that you create. The important thing is to pick an approach and stay with it (i.e., don’t mix solutions).

If you want or need more granular data than uptime provides for CPU monitoring, I would suggest using mpstat, which is part of the sysstat package included in many distributions.

The mpstatcommand writes CPU stats to standard output (stdout) for each available processor in the node, starting with CPU 0. It reports a boatload of statistics, including:

  • CPU: Processor number for the output
  • %usr: Percentage of CPU utilization by user applications
  • %nice: Percentage of CPU utilization by user applications using the nice priority
  • %sys: Percentage of CPU utilization at the system (kernel) level, not including the time for servicing hardware or software interrupts
  • %iowait: Percentage of CPU utilization spent idle during which the system had an outstanding I/O request
  • %irq: Percentage of CPU utilization spent servicing hardware interrupts
  • %soft: Percentage of CPU utilization servicing software interrupts
  • %steal: Percentage of CPU utilization spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor
  • %guest: Percentage of CPU utilization spent by the CPU or CPUs to run a virtual processor
  • %gnice: Percentage of CPU utilization spent by the CPU or CPUs to run a guest with a nice priority
  • %idle: Percentage of CPU utilization spent idle while the system did not have an outstanding I/O request

Mpstat allows you to run the command with a specified interval and run it as long as the node is powered on. You can run it as a regular user or as root. However, if you want to write the output to a system logfile, you might have to create a cron job that runs the command once at a specified interval and writes the appropriate part of the output to the system logs with logger.

Depending on what you want, you can get statistics for all of the processors combined or for every processor in the node. If you have enough logging space, I recommend getting statistics for each processor, which allows you to track a number of things, including hung processes, processes “jumping” around processors, or processor hogs (a very technical term, by the way).

One last comment about CPU statistics: Be very careful in choosing an interval for gathering those statistics. For example, do you really want to gather CPU stats every second for every compute node? Unless you do some special configuration, you will be gathering statistics for nodes that aren’t running jobs. Although it could be interesting, at the same time, it could just create a massive amount of data that indicates the node wasn’t doing anything.

If you gather statistics on each core on a node with 40 total cores every second, in one minute you have gathered 2,400 lines of stats (one for each core). If you have 100 nodes, in one minute you have gathered 24,000 lines of stats for the cluster. In one day, this is 34,560,000 lines of stats for the 100 nodes.

You could increase the interval in gathering the statistics to reach a target for the amount of stats gathered, but another option is a little more clever: On each node you could have a cron job that gathers the CPU stats every minute or few minutes (call this the “long-term” CPU stats metric) that are then written to a specific log. Then, in the prologue and epilogue scripts for the job scheduler, you could create or start a cron job that gathers CPU stats more frequently (call this the “short-term” CPU stats metric). When a job is started on the node by the job scheduler, the CPU stats are then written to a different log than the long-term CPU stats, which allows you to grab more refined statistics for jobs. Moreover, you can correlate the CPU stats with the specific job.