Detect anomalies in metrics data

Jerk Detector

Attacks on environments are just as much a part of the daily grind in IT as operating the IT infrastructure itself. The range of attacks is wide and depends on the attacker's goals. Classic denial-of-service attacks are not complex and quite easy to detect. However, when the focus shifts to sniffing data, the methods are far more subtle, and highly complex IT attacks on different levels are no longer challenging.

As complex as the attack scenarios are, one factor remains the same: Administrators want to notice as early as possible that bad things are going on in their setups so they can react promptly. The sooner an attack is detected, the sooner it can be counteracted and the less damage it can cause.

Rigid Limits of Limited Use

The ability to detect an attack early depends on the tools available and how you use them. In the past, most admins relied on run-of-the-mill event monitoring with thresholds: If the incoming data volume exceeded a certain limit, the monitoring system sounded an alarm. If too many invalid login attempts appeared in the servers' authentication logfiles, you were notified. The focus here is on enabling you to act as quickly as possible in a specific case (i.e., conveying the current situation).

This approach is not particularly up to date or smart. Modern monitoring systems like Prometheus collect such large volumes of metrics data that it can be used to identify trends and anomalies, potentially indicating that attacks are in progress. Even distributed denial-of-service (DDoS) attacks have ceased to follow the principle of taking a server offline with as much traffic as possible in as short a time as possible. Instead, postmortem analyses of attacks regularly reveal that attackers successively increased the traffic in the weeks leading up to an attack and did so in such a way that they always flew under the radar of the thresholds in monitoring. At the

...

Use Express-Checkout link below to read the full article (PDF).