More Small Tools


I hope this and the previous article pointed out some useful commands for HPC administration. Although the commands tend to be very simple (e.g., watch),they can be very powerful; they are also very useful for plain old Linux administration, not just HPC. Keep these commands close by on a Post-it note; when you're beginning to debug an issue, a glance at the list will remind you to start with simple tools. You can move on to the “fancy” solutions after you have bound the problem. These tools have saved my bacon more than one time, and I hope they help you.

Although I poke fun at system users a little in these articles, just remember that without the users, we would have no need for HPC administration. They are capable of doing some really cool things; most of all, they are focused on their science, engineering, and research. I started in HPC by being a user, and I'm sure the system administrators were annoyed with me on more than one occasion. If I didn't say it then, let me say it now: Thank you for all the help.

I can’t finish this article without adapting a phrase from my time in the military:

If you drop off a soldier with a small hammer and an anvil in the middle of the desert and come back in eight hours, the anvil will be broken

The version for HPC administrators goes:

If you give a new user vi and a terminal, in eight hours the HPC system will be down