System logging for data-based answers

Log Everything

Storage

Recording storage logs is very similar to recording network logs. Grabbing each data packet going to and from the storage system and the drives results in a large amount of information, most of which is useless to you. Instead, think about running simple I/O tests in a job's prologue and epilogue scripts and recording that data. Of course, the results will vary depending on the I/O load, but it's worth understanding I/O performance when the job is getting ready to run.

In addition to capturing performance information, you can grab I/O performance statistics from the servers and clients. A simple example is NFS. A great tool, nfsiostat, allows you to capture statistics about NFS client and server activity. With respect to clients, you can grab information such as:

  • Number of blocks read or written
  • Number of reads and writes (ops/sec)

With this information, you can get a histogram of the NFS performance of both clients and servers.

In addition to nfsiostat [8], you can use iostat, which collects lots of metrics on the storage server, such as CPU time, throughput, and I/O request times. You can also use iostat [9] to monitor I/O on client nodes.

Likely, you are already using filesystem tools, so you can easily look for errors in the filesystem logs and collect them (script this). These logs are specific to a filesystem, so be sure to read the manuals on what is being recorded.

Summary

A number of system administrators are reluctant to log much more than the minimum necessary, primarily for compliance. However, I'm a big believer that having too much information is better than not having enough. More logs means more space used and probably more network traffic, but in the end, you have a set of system logs that you can use to your advantage.

To review, here are four highlights:

  • Log everything (within reason).
  • Put a time stamp on it.
  • Put a node name on every entry.
  • Be a lumberjack, and you'll be OK.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • System Logging for Data-Based Answers

    To be a good HPC system administrator for today’s environment, you need to be a lumberjack.

     

  • What to Do with System Data: Think Like a Vegan

    What do you do with all of the HPC data you harvested as a lumberjack? You think like a Vegan.

  • Gathering Data on Environment Modules

    Gathering data on various aspects of your HPC system is a key step toward developing information about the system and one of the first steps toward tuning your system for performance and reporting on system use. It can tell how users are using the system and, at a high level, what they are doing. In this article, I present a method for gathering data on how users are using Environment Modules, such as which modules are being used, how often, and so on.

  • Log Management

    One of the more mundane, perhaps boring, but necessary administration tasks is checking system logs – the source of knowledge or intelligence of what is happening in the cluster.

  • Nmon: All-Purpose Admin Tool

    HPC administrators sometimes assume that if all nodes are functioning, the system is fine. However, the most common issue users have is poor or unexpected application performance. In this case, you need a simple tool to help you understand what’s happening on the nodes.

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=