HPC fundamentals

Quick on the Uptake

Summary

A tool that allows you to run commands on a range of nodes is probably the most fundamental tool an HPC admin can use. Even for experienced admins, such an easy-to-use tool can help you understand quickly the state of your system. Arguably, the most popular parallel shell is pdsh. It is easy to use and flexible and has very useful modules to extend its capability.

The pdsh tool can be used on the cluster in a number of ways. An extremely common use is to check the load on all of the nodes in the cluster (uptime) to determine whether the node is up or down and report the load on the node. A myriad of other uses range from checking the version of software installed on the nodes, to spot monitoring, to installing packages.

The pdsh command lets you define a list of target hosts to include or exclude and allows you to treat clusters in subgroups when performing operations or to group hosts on the basis of function. Using modules, you can group target hosts by SLURM_JOBID, so you can query nodes that are part of a single job.

Finally, you can use pdsh in conjunction with scripts on a shared workspace and then use the command to run the scripts on target hosts. However, a word of caution: If possible, do not run commands or scripts that have multiline output you would have to reassemble into the proper order.

If you are starting out in the cluster world, or even if you are an experienced administrator, pdsh is a go-to tool for managing and monitoring systems.

The Author

Jeff Layton has been in the HPC business for almost 25 years (starting when he was 4 years old). He can be found lounging around at a nearby Frys enjoying the coffee and waiting for sales.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=