Monitoring with Nmon

HPC administrators sometimes assume that if all nodes are functioning, the system is fine. However, the most common issue users have is poor or unexpected application performance. In this case, you need a simple tool to help you understand what’s happening on the nodes.

Stat-like command-line tools for admins

ASCII tools can be life savers when they provide the only access you have to a misbehaving server. However, once you're on the node what do you do? In this article, we look at stat-like tools: vmstat, dstat, and mpstat.

The Tops

Admins solve problems ranging from slow servers to failing applications. The first tool I reach for when I need to check on a server with shell access is Top.

How the Spanning Tree protocol organizes an Ethernet network

Ethernet is so popular because it simply works and is inexpensive. However, the administration side looks a bit more complicated: For the network to run smoothly, the admin might need to make important decisions about the Spanning Tree protocol.

ioprof, blktrace, and blkparse

Understanding how applications perform I/O is important not only because of the volume of data being written and read, but because the performance of some applications is dependent on how I/O is conducted. In this article we profile I/O at the block layer to help you make the best storage decisions.

Tool Your HPC Systems for Data Analytics

As data analytics workloads become more common, HPC administrators need to assess their hardware, software, and processes.

Graphite

Graphite converts confusing columns of time series data into handy diagrams, showing trends in operating system and application metrics at a glance.

A Better Builder

Developers fed up with cryptic Makefiles should take a look at the new Meson build system, which is simple to operate, offers scripting capabilities, integrates external test tools, and supports Linux, Windows, and Mac OS X.

Iron Ore

Google Compute Engine removes the technical and financial headaches of maintaining server, networking, and storage.

Parallel Shells

The most fundamental tool needed to administer an HPC system is a parallel shell, which allows you to run the same command on a series of nodes. In this article, we look at pdsh.