Updates and Upgrades in HPC

Backups

I don’t want to get deep in a discussion of backups and images of nodes; there is a whole philosophy around them and how to use them. I just want to mention a couple of ideas in relation to updates and upgrades.

I use “backup” to mean a copy of files from a node, such as /home or /opt, but I can’t really boot a node from these. This doesn't mean that backups aren’t useful, of course, but it does mean that if I need to re-install a node, I want to accomplish this directly.

I like to take images of nodes for re-installation. You can refer to these as backups if you like, but I prefer to separate backups, indicating just files, from images that you use to re-install a node and even boot that node.

Linux has a plethora of imaging tools. I have experience with Mondo Rescue, which I used when I was an admin at Lockheed-Martin. One thing I really liked was that it could create an ISO image that I could put on a CD or DVD (way back then) or a USB drive (today) to re-install the node.

A combination of backups and node images might provide the best way to restore a system quickly.

HPC and Updates or Upgrades

Systems are different, but the HPC world is different from the enterprise world or cloud world – or any other IT world for that matter. The focus of HPC is on high performance, and I’m referring to the “center of mass” of HPC.

HPC doesn’t dismiss security patches, but it points to the number one focus: application performance. Perhaps not always noticed is that HPC focuses on the user’s scientific applications, not a database nor a web server. This results in two aspects of HPC: (1) You tune the operating system to get the best possible performance without wasting too much time, and (2) once you tune the operating system, you leave it alone and focus on the user applications.

The second aspect is very important. Once the distribution is reasonably tuned, you turn to the user applications. My experience with HPC systems is that once the distribution (operating system) is installed, you might see one or two minor release updates, but it is rare to see an upgrade. To repeat myself: The focus is on achieving large improvements in user applications. Therefore, you will see that HPC systems have a variety of compilers, libraries, and tools that are added, updated, or upgraded throughout the life of the system.