Virtuous Benchmarks: Using Benchmarks to Your Advantage

Small Node Groups

After the single-node runs are done, I test small groups of nodes. You can either arbitrarily pick the number of nodes per group to test, or you can group the nodes together so that they all belong to a single switch. Generally, I try to run four nodes per group to keep things simple. In these groups, I run tests with both a single core per node and all the cores per node, allowing me to stress the nodes in different ways. The goal of small-node-group testing is to start introducing network performance as an overall parameter. For these runs, you have to use the MPI version of the NPB tests, and I would run the same tests as used in the single-node runs.

I recommend running two different classes for these small node groups, beginning with A or B, to stress the network by taking a small problem and spreading it across a number of processes. However, real systems are seldom run this way, because it is not an efficient use of the system. Therefore, I would also run the largest class problem possible to stress the memory, CPU, and network.

After running these tests, you again perform a statistical analysis on the results in the exact same manner as described for the single-node runs: compute the average and standard deviation of the tests, look for outliers in the data, run more tests on those groups, and perhaps triage the nodes if needed. I would also recommend comparing the nodes in this outlier groups to the outliers in the single-node tests to look for correlation.

As with the single-node tests, be sure to store the results somewhere you can easily retrieve them, along with the source and binaries and how you built the code, including versions.

Larger Node Groups

After running small groups of nodes you can run larger groups by combining the smaller groups. How many nodes you use in these larger groups is up to you. Regardless, you should follow the same process as used for the single-node and small node count tests. The most important thing to remember is to store the results once you are done.

If you want, you can repeat the process of testing larger and larger node groups until you reach the entire cluster. Sometimes this is useful if you are trying to do a TOP500 run, because you can leave slower nodes out of the run that hurt the final result. However, what you have after you finish all of these tests – from the single-node benchmarks to the larger node count tests – is some very important and extremely useful information: a fairly extensive database of benchmark results that includes the results from standard benchmarks for all of the individual nodes and groups of nodes, as well as a history of outlier nodes relative to the others. This kind of information can be extremely valuable to HPC system administrators.