The world’s leading publication for data science, AI, and ML professionals.

A new application of structural entropy for multivariate time series anomaly detection

How to find the appearance of missing or constant data in many signals quickly

I like time series. In this post, I’ll continue exploring multivariate time series and introduce a new approach for detecting one kind of particular anomaly. Again, I still use entropy, a simple and powerful tool.

Although it depends on the application, we can generally assume we like our time series to be continuous and varying without any missing values. But real-world anomalies could appear because of equipment errors, incorrect measurements, and data collecting and storing issues.

One result of such anomalies is that the data value is missing for specific periods. The other case is the data is replaced with a unique value indicating errors or filled with the previous value. In short, the signals become missing or constant for those anomaly periods.

We can scan the time series with moving windows to identify those anomalies. The implementation is straightforward. What about multiple time series? Do we need to monitor each time series individually?

I’ll present an easy solution to treat this issue as an Anomaly Detection problem for multivariate time series. It won’t focus on individual time series but can still catch the anomalies if any time series become missing or constant.

I will skip the introduction of structural entropy and its application in detecting anomalies caused by correlation changes. If the concepts are new to you, please refer to my post, Anomaly Detection for Multivariate Time Series with Structural Entropy.

Generate synthetic dataset

Let’s use synthetic data this time. First, generate six time series. Three are generated by the Gaussian process (x0,x1,x2). Two are generated by a random walk (x3 and x4), and one by an ARMA model (x5). They are independent and uncorrelated. I’ll share the link to my notebook in the reference section below. Plot 1 shows the results.

Next, let’s add some anomaly sections. As you can see in Plot 2, there are three types of anomalies: uncorrelated becomes correlated, the data value is missing, and data value becomes constant.

A trick of the correlation matrix

Calculating the structural entropy depends on the results of the correlation matrix in the rolling window. The correlation matrix contains the Pearson correlation coefficient for every combination. Suppose one variable is all missing or constant; what is the Pearson correlation coefficient when combined with other variables?

We think the Pearson correlation coefficient is covariance normalized by individual variances (Equation 1). If one variable has no valid value, its variance doesn’t exist. If one variable is constant, then its variance is 0. For both cases, the denominator of Equation 1 will become either Na or 0, so the Pearson correlation coefficient will not be available by definition.

Below is the correlation matrix for the entire green anomaly section (x2 is missing). Since the value of x2 is missing, you can see the NaN for all the correlation coefficients involved x2.

Now let’s apply a trick to the matrix. If we replace the NaN with 1, then x2 will become 100% correlated with other variables and connect them to form a single cluster by connections. What’s the entropy for one single group? Zero.

Magic of entropy

Plot 4 shows the structural entropy after applying the correlation matrix calculation trick, which is simply to replace all Na with 1. The entropy of anomaly section 1 (orange, around index 200 to 300) drops because of a new-constructed correlation, which is expected. The entropy of anomaly sections 2 and 3 (green and red) drops to zero because of missing data or constant data.

We can render all the info together in Plot 5. Please note the entropy becomes the 2nd y-axis on the right. You may notice that the entropy doesn’t change for the last two anomaly regions (brown and purple). The reason is the size of the observation or detection window. The window is set with 50 data points. For the last two anomaly regions, their length is less than 50. That means there are valid values for any rolling window size of 50 in those anomaly areas, so there will be no Na in the correlation matrix, and finally, the entropy won’t become zero.

Conclusion

Structural entropy offers more benefit than identifying correlation changes, as seen in the trial data and stock market example. One application could be to monitor real-time signals. We can group many time series together and calculate their entropy. Not only will it catch correlation drift and anomaly, but also the missing or constant value for any time series in the group. We don’t need to monitor each individual.

We can also apply structural entropy to multivariate time series segmentation, like cutting the time dimension to segments based on the entropy value.

The drawback of this approach is that it won’t tell us which signals are causing the abnormal entropy value explicitly. We may need to do more exploration. I guess this is a fair trade. We now can see the overall status of several or even hundreds of time series; in exchange, we blur our attention on individuals. We also need to carefully select a window size that fits the data and business requirements.

Thanks for reading.

Have fun with your time series data!

References

Notebook on GitHub

Anomaly Detection for Multivariate Time Series with Structural Entropy


Related Articles