pdsh Parallel Shell

More Useful pdsh Options

The previous examples are fairly simple. To better understand the breadth of options to be used with pdsh, I will show you some different commands. Listing 2 illustrates how to run a complex pdsh command. Before getting into the details, notice that each output line from pdsh lists the node followed by the response from the command.

Listing 2: Complex pdsh Command 1

$ pdsh `cat /proc/cpuinfo | grep bogomips`
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23

Notice that the entire command is in backquotes, meaning the entire command is run on each node. This includes the first part, cat /proc/cpuinfo, whose output is piped to the second part of the command, grep bogomips. Using the backquotes allows you to run complex commands on the target nodes.

For the particular command in Listing 2, the value of bogomips differs for each node because the nodes are different: The first node has eight cores (four cores and four Hyper-Threading cores), whereas the second node has four cores. Consequently, the bogomips value should be different for the two nodes.

Listing 3 is a variation of the command in Listing 2. Notice that the entire command is not contained in backquotes; rather, only the first part of the command is contained in single quotes. When you run this pdsh command, the part in the quotes is run first on all targeted nodes. After the command returns, the output is piped through the second part of the command, grep bogomips, which is executed on the node where pdsh was run. The point of these command variations is to show that you need to be careful how you construct a command so you understand the output.

Listing 3: Complex pdsh Command 2

$ pdsh 'cat /proc/cpuinfo' | grep bogomips
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.4: bogomips   : 6998.13
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23
192.168.1.250: bogomips : 5624.23

A very important item to note is that pdsh does not guarantee that the output is returned in any certain order. If 20 nodes are targeted in the list, the output from pdsh will not necessarily start with node 1 and increase incrementally to node 20. Listing 4 is an example of a vmstat command run on two nodes. The command should run twice on each node in one-second intervals.

Listing 4: Commingled Output

$ pdsh vmstat 1 2
192.168.1.4:  procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
192.168.1.4:   r  b swpd   free   buff  cache    si   so   bi    bo    in   cs us sy  id wa st
192.168.1.4:   1  0    0 30198704 286340 751652   0    0    2     3    48   66  1  0  98  0  0
192.168.1.250: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
192.168.1.250:  r  b swpd   free   buff  cache    si   so   bi    bo    in   cs us sy  id wa st
192.168.1.250:  0  0    0 7248836  25632  79268    0    0   14     2    22   21  0  0  99  0  0
192.168.1.4:    1  0    0 30198100 286340 751668   0    0    0     0   412  735  1  0  99  0  0
192.168.1.250:  0  0    0 7249076  25632  79284    0    0    0     0    90   39  0  0 100  0  0

At first glance, it looks like the output is from the first node but then the output from the second node creeps in. A command with multiple output lines cannot guarantee the order of the output. If you really have to run a command across the target nodes with multiple lines of output, the only real choice is to put all of the output into a file and edit it to rearrange the lines so they are in the correct order.

A word of caution, though: If the command produces multiple output lines, for example three lines, it is possible the output lines from a single node will arrive out of order; for example line 3 could arrive before line 2. Ideally having a tag on each line of output would allow it to be reassembled much more easily.

Another technique for using pdsh is to run scripts on each node. For example, in previous articles on processor and memory metrics and process, network, and disk metrics, scripts were designed to create the metrics. With some simple modifications, these scripts can return a single line of output. Putting the scripts in a central location for each node allows pdsh to run the script on the target nodes.

pdsh Modules

Earlier I mentioned how pdsh uses rcmd modules by default to access nodes. The developers have extended this to create modules for various specific situations. The pdsh modules page lists other modules that can be built as part of pdsh, which currently includes:

  • rcmd/rsh
  • rcmd/ssh
  • rcmd/mrsh (uses munge authentication)
  • rcmd/xcpu
  • misc/genders (node selection using libgenders)
  • misc/nodeupdown (uses nodeupdown library)
  • misc/machines (provides an option for a flat-file list of hosts)
  • Slurm (list of targets built from SLURM_JOBID or -j jobid)
  • misc/dshgroup (list of targets built from dsh-style “group” files)
  • netgroup (list of targets to be built from netgroups)

These modules allow pdsh to do specific things. For example, the Slurm module allows you to run the command only on nodes specified by currently running Slurm jobs. When pdsh is run with the Slurm module, it will read the list of nodes from the SLURM_JOBID environment variable. You can also run pdsh with the -j jobid option, and it will get the list of hosts from the jobid specified.