Lead Image © Alexander Rivosh, 123RF.com

Monitoring HPC Systems

Nerve Center

Article from ADMIN 21/2014

By Jeff Layton

Ganglia is probably the most popular monitoring framework and tool, in that HPC, Big Data, and even cloud systems are using it. In this article, we show you how to install and configure Ganglia and get it up and running on a simple two-node system.

When you know better, you do better – Maya Angelou

Monitoring clusters and understanding how the cluster is performing is key to helping users better run their applications and to optimizing the use of cluster resources.

Such information is valuable for a variety of reasons, including understanding how the cluster is being used, how much of the processing capability is being used, how much of the memory is being used for user applications, and what the network is doing and whether it is being used for applications. This information can help you understand where you need to make changes in the configuration of the current cluster to improve the utilization of resources. Moreover, this information can help you plan for the next cluster.

In a past blog post, I looked at monitoring from the perspective of understanding what is happening in the system [1] (metrics) and how important it can be to understand the frequency at which you monitor the metrics.

If you put several cluster admins in a room together (e.g., the BeoBash [2]), and you ask, "What is the best way to monitor a cluster?" you will have to duck and cover pretty quickly from the huge number of opinions and the great passion behind the answers. Having so many options and opinions is not a bad thing, but you need to sort through the ideas to find something that works for you and your situation.

In two further blog posts [3] [4], I wrote some simple scripts to measure metrics on a single server as a starting point for use in a cluster. This code measured the processes of interest by collecting data on an individual node basis.

Now it's time to look at monitoring frameworks where, I hope, the scripts will be useful for custom monitoring and

...

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
</a>

<hr>
</div>
</div>

<div class=

Monitoring HPC Systems

Nerve Center

Buy this article as PDF

Buy ADMIN Magazine

Related content

Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Most Popular

Support Our Work

Monitoring HPC Systems

Nerve Center

Buy this article as PDF

Buy ADMIN Magazine

Related content

Subscribe to our ADMIN Newsletters Subscribe to our Linux Newsletters Find Linux and Open Source Jobs

Most Popular

Support Our Work

Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs