Updates and Upgrades in HPC

Minor Version: Update or Upgrade?

After updating within the minor version, I then think about updating that version. An example of this is updating Rocky 8.6 to Rocky 8.7. Although I think of it as an update, you could easily think of this as an upgrade because it involves a bit more effort. Moreover, it will almost always have regressions, so it might feel like an upgrade rather than an update. However, I still refer to this as an update, and I’ll save the term “upgrade” for a change in major versions.

The same steps I discussed in updating within a minor version apply to updating minor versions themselves. In my opinion, though, you need to pay more attention to the updated packages. Although you can update and test them as individually as possible, this process could be a bit tedious. However, I don't recommend that you install them in one shot.

One addition I recommend to the previous documentation steps is to take screenshots periodically and put them in a document. I like doing this because I can step through the script file, but seeing the whole terminal adds a bit more context.

Major Version Upgrade

An example of a major version upgrade is going from Rocky 8.6 to Rocky 9. This change is big and is what I consider an upgrade to the cluster. My advice is, first, to plan a complete wipe of the system and start from scratch because, at best, I have seen too many automatic “upgrades” that create a “Franken System” that is truly never the upgraded version and always has bits left over from the previous version. Second, I have first-hand experience of using automatic upgrades, and they left the system so damaged that I had to restore it from a backup or an image.

Before wiping the old distribution from the cluster, I always make a backup of the critical nodes (e.g., head node, login nodes, control nodes for things like Slurm, and possibly gateway or storage nodes). However, backups might not be what you want or need. Think about using images of the critical nodes so that re-installation is much easier.