When existing admin tools just aren’t cutting it, you can write your own. We look at the excellent Bash scripting and Python programming languages.

Write Your Own Admin Tools

If you’re just starting out as an admin, you tend to stick with existing Linux tools. After a while, you will find the need to modify or combine the tools to create something more useful. Admins write their own tools for specific tasks with languages such as Bash and Python. Beyond producing useful output, writing your own scripts gives you control over your tools, allowing you to extend them easily.

Your scripts don’t have to be massive works of art; rather, they should be tools that solve a problem. You don’t even have to write an entire script: You can use Linux commands in combination to provide information you need or want.

To begin, create simple scripts (which I also call tools) that return a single response. Think of how you function and the information you use, then you can determine whether anything is missing. Perhaps you want to see the output from two different tools but in cohesive output.

This article is not a guide to writing your own tools. Rather, it is a simple starting point if you want to write your own. I’m going to start by discussing how you can use Bash as your programming “language” for writing tools.

Bash

Bash is probably the most popular scripting language because (1) it is in every distribution of Linux; (2) you can take advantage of all the Linux commands in Bash, which you can do in other languages but seems to be easier in Bash; and (3) you can find a huge number of examples online that can be adapted for HPC.

A good place to start is to learn or re-learn Bash coding. You can find comprehensive resources at DataCamp and The Linux Documentation Project. My favorite example of a beautiful tool written in Bash is bashtop, a 100% Bash tool that displays top-like information and even allows you to choose themes. Another interesting Bash script is a graphical sci-fi game named ASCIIDENT.

If you poke around online, you will find many Bash scripts, especially tools for Linux admins. The best place to start is with something simple. Think of some commands you often run in sequence. Perhaps you want to list files in a specified directory and then get the total size of a directory in a different location that is related to the specified directory:

$ ls -sh .
$ du -sh /mnt/test/data1

Rather than run these commands in sequence, you can write a Bash script that runs them for you; put the tool somewhere central and add the path of the script location to you $PATH variable.

The next level for this script would be for it to take a command-line argument so you can specify the specific path on the command line (e.g., instead of always entering /mnt/test/data1). After that, you could use Bash to capture the output from the two commands, manipulate them, and then echo the output to the shell. The processing could include a statistical analysis of the data before it outputs something to stdout (standard output).

One of my favorite scripts that someone else wrote wraps Slurm job submittals. Although the Slurm command line itself isn’t difficult, and you should know the common commands for HPC, sometimes you need to look up some details when you're running a specific type of job. The script I use embeds those specific default options in the script but allows me to override any options before creating and running the Slurm command. It also allows for command-line overrides with temporary environment variables. I call this wrapper script from my own script so I can insert my command-line arguments. Even better, the wrapper script cleans up by removing any temporary data and moving data to a specific location. My script, which I typically name runit, becomes a simple one-line command specific to my application.

This example might seem boring, but having to remember the Slurm options I need for a specific type of application can be difficult. The script is simple, it works, and I can modify the script if needed.

Bash Add-Ons or Extensions

Once you are comfortable with Bash scripting you can start getting a bit fancier and use Bash add-ons or extensions. In a previous article, I mentioned extensions for pop-up notifications augmented with sound that you can use on your desktop to indicate certain states (e.g., Slurm jobs finishing or a Jupyter notebook terminal being ready). In the article I also covered the following topics:

  • Text (TUI) and graphical user interface (GUI) tools
  • Text-based plotting (another one of my favorite topics)
  • System logging

I also mentioned the use of ncurses in Bash with some extension libraries that allow you to create TUIs, but I didn’t go into detail.

Bash and Storage Management

Bash is a great tool for creating scripts for common storage management tasks, including how to find directories and files that use the most disk space and how to find the most recently modified files. Another article develops a simple sample script for monitoring disk IOPS (input/output operations per second), although you can modify it to monitor almost anything (Listing 1).

Listing 1: Bash Disk I/O

#!/bin/bash

while true; do
prev=$(cat /proc/diskstats | grep 'sda ' | awk '{print $4+$8}')
sleep 1
curr=$(cat /proc/diskstats | grep 'sda ' | awk '{print $4+$8}')
iops=$((curr-prev))
echo "IOPS: $iops"
done

The admin tools you can code in Bash is really limitless. However, remember that although Bash is almost always installed on the compute nodes in HPC systems, the central location for the scripts and any extension libraries you use might not be, so be sure to install them in a shared location that all compute nodes access.

Finally, although I’m not a Bash-wiz, I recommend you start simple with common tasks you need. Then, if the scripts are heavily used and need customization, you can add command-line arguments and even TUI/GUI capability. However, don’t shoot for the moon from the beginning (you will be disappointed).

Python

You don’t have to write your scripts in Bash. Plenty of other scripting languages can be used. The current favorite is probably the widely used Python, which has many add-on libraries and it easily installed. Most of the time, a basic Python build is installed with a base version of a distribution. Be sure you are using Python 3 and not Python 2 (look for the command python3; if it is a symlink, check what it points to). Remember, the installed version is just basic Python 3 without all the extra libraries you might need.

A key aspect of Python you need to keep in mind for HPC is that it will need to be installed on every node. Even though it is popular it’s not as ubiquitous as Bash. Moreover, if you use any extension library, you either need to install on all of the nodes or install on a shareable space that all nodes mount. Although not difficult, you need to plan for this detail.

You can use Python as you would Bash. From within the script, you can call command-line tools and capture the output, then process and output it any way you want. You can start with a simple script and then get more complicated if needed.

Python by itself has several modules you can use to create tools, including, among others, math, argument parsing, gzip interface, logging, multiprocessing, operating system interface, statistical, and date and time modules. You can take advantage of these modules rather than resorting to Linux commands external to Python.

As an example of Python scripting, I’ll take the previous Bash script and re-write it in Python (Listing 2). The script uses the subprocess module and time modules and creates a subprocess to gather stats from the /proc filesystem.

Listing 2: Python Disk I/O

import time
import subprocess

def calculate_iops():
while True:
prev = int(subprocess.check_output("cat /proc/diskstats | grep 'sda ' | awk '{print $4+$8}'", shell=True))
time.sleep(1)
curr = int(subprocess.check_output("cat /proc/diskstats | grep 'sda ' | awk '{print $4+$8}'", shell=True))
iops = curr - prev
print(f"IOPS: {iops}")

calculate_iops()

Python Add-Ons and Extensions

Python is one of the hottest languages right now. It is very useful for a wide range of tasks and is used in a huge number of disciplines, including artificial intelligence (AI). A massive number of extensions have been written for Python, including numerical libraries and web and visualization tools. Combined, all these aspects make Python one of the most used languages by HPC admins.

What Python offers that other languages do not is a massive list of extensions and libraries. I won’t even begin to try to list them, much less illustrate the range. The number is truly huge. However, I want to share one library I particularly like, because it allows me to create tools for monitoring the system.

psutil

The popular psutil (process and system utilities) cross-platform (Linux, Windows, macOS, various BSDs, Solaris, and AIX) extension library gathers information on running processes and system metrics such as CPU, memory, disk, network, and sensor. You can write all sorts of tools that use psutil. The GitHub site has many script examples, including a top-like tool, an iotop tool, a process tree utility, and a script for gathering system temperatures.

The psutil library can do all kinds of things and it comes in very handy for HPC administration. For example, you could use your script in conjunction with a parallel shell such as pdsh to gather metrics for the cluster.

Python and Storage Management

I will leave you with some tools and articles that discuss the use of Python in conjunction with storage management. A great first resource discusses the use of Python to analyze your filesystem and directory structures. The scripts in the article also illustrate how to do a statistical analysis of the gathered system information and create plots. Even if you’re not involved in storage management, this article is a good start to your Python tool journey.

Sometimes people write Python versions of existing tools so they can then modify them to add more capability. An example is pydf, a “clone” of the Linux df command that is pretty simple to use.

Because you have the pydf source, you can add capabilities such as statistical analysis, time history analysis, and plotting (by storing the results over time and plotting the resulting time history).

Summary

New system administrators, particularly those in HPC, begin with a diet of Linux commands. Some wonderful commands and tools can help you manage, control, and monitor your systems. After a period of time, virtually all of the commands feel limited, which is when you start writing your own tools.

I hope this article provides some tidbits for writing your admin tools. Linux allows you to use pipes to connect commands to create something new or improved, but Bash and Python are probably the most popular for going beyond the commands to scratch your admin itch. I use both languages for administering systems, but I have to admit, I tend to reach for Python first. One of my New Year's resolutions is to start writing more tools in Bash. Either way, you can create something that answers your needs, something that you control, and, one hopes, something you share with others.