Robust Covariance for Anomaly Detection

Detecting anomalies and outliers by means of the Mahalanobis distance.

Georgi Tancev
Towards Data Science

--

Photo by Ian Taylor on Unsplash

New technologies like the internet of things combined with artificial intelligence are enablers of interesting applications such as predictive maintenance, i.e., autonomously assessing the condition of equipment in order to estimate when maintenance should be performed. For this purpose, it is important to have mathematical models that can distinguish between “normal” and “non-normal” behavior of the equipment, a task called anomaly detection. Equipped with all kinds of sensors, a machine or device can use this collection of sensor data (e.g., temperature, fan speed, pressure, flow rate, etc.) to assess its state and call for service if necessary; the more its behavior deviates from its reference state just after fabrication, the more it will need maintenance.

In statistics, deviation can be assessed by the Z-score. The generalization of the Z-score for a point xᵢ in the case of a p-dimensional multi-variate probability distribution with some mean μ and covariance matrix Σ is known as Mahalanobis distance dᵢ, which is given by:

Equation 1: Mahalonobis distance (image by author).

This metric assesses how many standard deviations σ away xᵢ is from μ, thereby being dimensionless. An extreme observation has a large distance from the center of a distribution. It is useful in predictive maintenance as it enables finding unusual behavior of a system; upcoming down-times and malfunction can be targeted and identified, which is particularly important since finding faults in equipment, machines, and devices early can reduce the extent of the damage.

Given a data set X with n samples in p variables (i.e, data from p sensors), mean and covariance matrix are easily computed. However, in the presence of outliers, both are distorted and the Mahalanobis distance is rendered useless, hence a robust method, i.e., a method that is little affected by outliers, would be desired. In Figure 1, a Gaussian distribution for some two-dimensional sensor data set X is plotted; just like with the Z-score (e.g., ±1.96σ), an envelope around the data set can be constructed by choosing a critical value of the Mahalanobis distance. Points outside this envelope are considered anomalies/outliers.

Figure 1: Gaussian distribution for a 2-dimensional data set (image by author).

Robust covariance methods are based on the fact that outliers lead to an increase of the values (entries) in Σ, making the spread of the data apparently larger. Consequently, |Σ| (the determinant) will also be larger, which would theoretically decrease by removing extreme events. Rousseeuw and Van Driessen developed a computationally efficient algorithm that can yield robust covariance estimates. Their method is based on the assumption that at least h out of the n samples are “normal” (h is a hyperparameter). The algorithm starts with k random samples with (p+1) points. For each of the k samples, μ, Σ, and | are estimated, the distances are calculated and sorted in increasing order, and the h smallest distances are used to update the estimates. In their original publication, the subroutine of computing distances and updating the estimates of μ, Σ, and | is called a “C-step”. They figured out that two such steps are sufficient to find good candidates (for μ and Σ) among the k random samples. In a next step, a subset of size m with the lowest | (the best candidates) is considered for computation until convergence, and the one estimate whose | is minimal is returned as output.

Note that the squared distance dᵢ² follows a Chi-square distribution with p degrees of freedom. To compute a critical distance (threshold) for the envelope, a quantile, i.e., the probability of observing a squared distance as extreme as the threshold — also known as p-value — is chosen, and the critical distance is then computed from the inverse cumulative distribution. (For one variable and a probability threshold/p-value of 0.05, the critical Chi-square value is 3.84, which corresponds to the squared distance; the square root of that is 1.96, which is consistent with the intuition for one dimension, e.g., from statistical tests.)

This algorithm is implemented in the elliptic envelope class of scikit-learn. Its most important hyperparameter is the amount of contamination ν, which is related to the parameter h in the original algorithm, since ν = 1-h/n. This hyperparameter has to be tuned to obtain the desired performance.

# Import libraries.
from
sklearn.covariance import EllipticEnvelope
# Fit detector.
detector = EllipticEnvelope(contamination=0.10, random_state=0)
is_outlier = detector.fit(X).predict(X)

The figure below demonstrates the ability to detect outliers of a p-dimensional sensor data set X in the principal component space of the sensor data, whereas the first two components are visualized.

Figure 2: Outlier detection with Mahalanobis distance (image by author).

The Mahalanobis distance is a simple and intuitive method that can perform well when the data is close to normally distributed. In other cases, the envelope might be not fitting perfectly around the data, but non-parametric methods such as one-class support vector machine or isolation forest have been developed as well. In practice, different algorithms have to be evaluated — including hyperparameter optimization. Embedding these models in machines and devices can bring a lot of added-value and lead to competitive advantage, now it is up to research and development engineers/managers to spot such opportunities.

--

--