Lead Image © Raoul Fesquet, fotolia.com

Lead Image © Raoul Fesquet, fotolia.com

Grafana and time series databases

More than a Thousand Words

Article from ADMIN 40/2017
By
We look at database back ends for monitoring, alerting, and trending analysis in the Grafana visualization tool.

Admins are rumored to feel more at ease working with text-based terminals than with graphical tools, and the command line often is better suited to classic admin tasks than a GUI because it allows scripting and direct input. For other admin tasks, though, this assessment is typically inverted: When it comes to processing measurement data and statistics, visual tools are clearly superior. In particular, data for monitoring, alerting, and trending (MAT) that comes from several sources requires visualization for meaningful analysis.

Virtually every large environment, whether a container platform or a public cloud environment, is strongly dependent on MAT. Only MAT provides reliable clues on the health and usage of systems that let you know when you need to add new hardware because your platform is currently fully utilized. Grafana [1] targets admins who need MAT analysis.

Match Winner

Grafana cooperates with various back ends and can pull the data you want to display from many sources. Its developers refer to this configuration as data-driven architecture. In place of classic event monitoring, in which monitoring is a spin-off of the need to collect various metrics continuously, Grafana uses a time-series-based principle. For example, if you operate a multinode cluster for MySQL based on Galera, you will want the load on all database back ends to be equally high. If system load on one of the back ends suddenly drops off, it is a certain indicator that something is wrong.

Unlike typical incident monitoring in the style of Nagios, Grafana bases its conclusions on performance data. Although Grafana is not primarily about monitoring, it does help you prepare the corresponding time series from monitoring systems in an easily interpretable way. In this article, I first highlight the key features of Grafana and then present the most important back ends from which it can draw its information.

New Monitoring Paradigm

To understand the motivation behind Grafana, you need to take a small excursion into the world of monitoring. A paradigm change has taken place in the past few years that, in turn, is closely linked to cloud computing. Monitoring a cloud is different from monitoring conventional IT platforms, which to a certain degree are static and, after setting up the environment, change only in the details. The standard tools for monitoring are well known to experienced admins: Nagios, Icinga, Check_MK, and various other solutions of same design.

Monitoring in conventional environments relies on events. If a service stops running on a server, the monitoring system notices and raises an alert. Trending plays a minor role in this classic scenario, because the workload of such a setup will tend to grow evenly, giving you sufficient time to purchase new hardware. That said, even conventional monitoring solutions cannot completely do without trending. For example, PNP4Nagios [2] uses checks to collect performance data and then displays the data in a graphical format directly in the Nagios web interface.

This arrangement no longer works for a public cloud, because it is not predictable when the platform will need to scale horizontally by adding new servers. For example, a new customer with a huge workload could easily set up an account in a typical public cloud and start generating a massive load.

Trending Becomes More Important

For newer types of platforms, trending therefore plays a bigger role than for its conventional predecessors. PNP4Nagios or comparable solutions look more like stop-gaps in these cases. They normally store their measurement data in the background in a normal database, usually MySQL, which is perfectly suited for classic, event-based monitoring, in which incidents become separate entries in a table. Because you are interested in the individual events in this scenario and the data is stored exactly that way in MySQL, you will have no problem retrieving or processing the data.

Trending changes the rules of the game because it does not focus on events, but on the evolution of performance data over time. In the case of trending, you no longer want to know whether or not a specific service was working at a given time; instead, you are interested in the central health values of the systems (e.g., CPU load and RAM usage). If their values remain high over a period of time, new hardware will be required for load balancing.

However, if you want to generate this information from individual events stored in a MySQL database, it would require many individual database queries, and thus a correspondingly high load. Also, it takes quite a while for MySQL or a similar database to provide answers to these queries.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus

SysAdmin Day 2017!

  • Happy SysAdmin Day 2017!

    Download a free gift to celebrate SysAdmin Day, a special day dedicated to system administrators around the world. The Linux Professional Institute (LPI) and Linux New Media are partnering to provide a free digital special edition for the tireless and dedicated professionals who keep the networks running: “10 Terrific Tools."

Special Edition

Newsletter

Subscribe to ADMIN Update for IT news and technical tips.

ADMIN Magazine on Twitter

Follow us on twitter