Monitoring, alerting, and trending with the TICK Stack

Cloud Radar

Anyone who needs to monitor large IT setups (e.g., a cloud) faces a challenge: Nodes come and go, and not every departure is a failure that needs to trigger an alert. In addition to monitoring and alerting, trending is also necessary; in many cases, it is the only way you can know when to add hardware to compensate for an increased base load.

Soon it becomes clear that typical monitoring solutions such as Nagios or Zabbix will not do. If you look into the subject in more detail, you end up with time series databases. In this article, I introduce the four components of the TICK Stack [1] (Telegraph, InfluxDB, Chronograf, and Kapacitor) and explain their respective strengths.

Deficits and Alternatives

The most prominent representative of this genre is probably Prometheus [2]. Launched as an internal tool by SoundCloud, the program and the additional components attached to it are now popular, but power users complain: In many respects Prometheus is missing functions, and design decisions were made that are not a good match for many setups.

An example is Prometheus Node Exporter, which is designed to collect metrics from the systems in the environment, often not in a way that the administrator desires (see the "Prometheus Add-ons" article in the previous issue [3]). Moreover, with a Prometheus server, you cannot store metrics redundantly and in a distributed storage system.

If your setup becomes too large for a single Prometheus instance, you have to split it, thus possibly canceling out one of the biggest advantages of a monitoring, alerting, and trending (MAT) system – namely, the single point of administration.

Additionally, Prometheus slows down as the volume of data increases. The program is fine

...

Use Express-Checkout link below to read the full article (PDF).