Photo by Ivan F on Unsplash

Photo by Ivan F on Unsplash

Artificial intelligence improves monitoring

Voyage of Discovery

Article from ADMIN 57/2020
Partially autonomous monitoring systems – or at least intelligent alerts – have the potential to reduce significantly the workload of administrators and the service desk.

Artificial intelligence (AI) and machine learning are on the rise in IT, especially in the field of system monitoring. AI, machine learning, and deep Learning often are wrongly used as synonyms. AI is the ability of a machine to make decisions similar to those made by humans. For example, software could decide to trigger an alarm that a human being would also have triggered. AI is the simulation of intelligent behavior by more or less complex algorithms.

Machine learning, on the other hand, means classifying methods, procedures, and algorithms that help the machine make decisions. Machine learning is the math that lets AI learn from experience. From this perspective, machine learning merely provides the basis for decision making.

An example can illustrate this point. The learning result is a percentage that assigns 90 percent of the current data to a certain type. Whether the machine then considers this value together with others as the trigger for an alarm has nothing to do – mathematically speaking – with the part of the algorithm that calculates the value.

Machine learning generally works more reliably the more data you have available. This method can also be understood as a filter. Today, most companies have such an abundance of data that manual evaluation is inconceivable. As a remedy, machine learning algorithms and other methods can be deployed to filter the available data and reduce it to a level that allows interpretation. After preparing the data set appropriately, rules for intelligent software behavior can then be defined.

In traditional machine learning, the user decides as early as the implementation stage which algorithm to use or how to filter what set of information. In deep learning, on the other hand, a neural network determines which information it passes on and how this information is weighted. Deep learning methods require a great deal of computing power. Although the underlying math has existed for a long time, it is only increased computing power that has made deep learning really universal in the last decade.

Various Methods and Results

Regardless of whether a purely statistical method of analysis – especially exploratory data analysis (EDA) – or a machine learning algorithm is used, a distinction can always be made between univariate and multivariate methods. Univariate methods work faster because they usually require less computing power. Multivariate methods, on the other hand, can reveal correlations that would otherwise have remained undiscovered. However, the application of complex methods does not always lead to better results.

In general, you can imagine a machine learning method as a two-part process (Figure 1): The first part – training the algorithm – comprises a detailed analysis of the available data. The purpose is to discover patterns and find a mathematical rule or function that explains this kind of pattern. Methods for this process include, linear, non-linear, logistical, and others. The goal is to obtain the smallest possible error component when explaining the data to be analyzed with the corresponding mathematical function. The function derived in the first step is then used to predict further data.

Figure 1: Interaction between training and test methods in machine learning.

The following sections illustrate the different forms of analysis by assessing the load curve of a server processor (Figure 2). The load remains below 50 percent over the entire period, except for a few isolated load peaks.

Figure 2: The processor load curve with regular peaks.

Univariate Analysis

Univariate analysis only ever examines one metric at any given time. Applied to the processor curve, this means that traditional resource or limit-based monitoring focuses on the higher punctual deflections in particular and classifies them as potential risks as a function of how the thresholds are configured and the number of times the thresholds are exceeded.

Furthermore, univariate analysis could also discover that limit violations almost always happen at regular intervals. The event in question therefore appears to occur cyclically. Most of the limit violations in the example can be assigned to this cyclical type, but others cannot, so it is no longer a matter of individual events, but of two groups of events. Additionally, the mean utilization value can be observed to remain constant in the first half of the data but increase in a linear manner in the second half (Figure 3).

Figure 3: The mean load value increases in the second half of the curve.

Although this effect is much less obvious, it should not be neglected: If the trend continues, it can lead to a continuously increasing server load in the future. In this case, you could use one part of the data to predict another part. Depending on how well this prediction works, it can be concluded that the processes that generate the data are still the same.

Several statements can be made about a single curve, but it is difficult to make decisions on this basis. Without additional data (e.g., from other measurable variables or other comparable servers), the administrator can only interpret the situation from their experience and respond accordingly.

Bivariate Analysis

In bivariate analysis, two curves are always examined simultaneously. It would thus be possible to find an explanation for effects in the data with the help of another curve. For example, the orange curve in Figure 4 could be used to explain cyclical load peaks (e.g., batch tasks that occur at constant time intervals and would lead to a higher processor load). Combined with the corresponding logs, the event can be classified as a planned, not a dangerous, activity. Only a single candidate from the remaining limit violations in the example defies this kind of explanation and needs to be examined more closely.

Figure 4: Bivariate analysis of the processor data. Most peaks appear to be attributable to cyclic events.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus