Thoughts and Theory

Topological Change Point Detection

Discovering Tipping Points in Time Series Data

Tim Josserand

Follow

Published in

Towards Data Science

7 min readJun 10, 2021

--

Change point detection is an important topic in time-series analysis covering a broad range of applications where it is required to detect significant divergence from a nominal behavior in systems characterized by their measurable time-series. Several real-world systems include solar flare clusters, firefly flash patterns, neurological spike trains, climate data and financial indices, to name a few.

Discovering the underlying structure of time series is useful for predicting the abrupt change in the data generating process of a signal source. Typically statistical analysis is used in change point detection to identify when the probability distribution of a stochastic process significantly impacts changes in the time-series characteristics. Here I present an approach that applies methods from topological data analysis to extract features and predict the onset of appreciable structural alterations in time series data.

Specifically, this article will demonstrate the use of topological data analysis to predict change points in time series. The time-series are synthesized from the classical Kuramoto model and topological features are extracted from the persistence diagrams of the dynamics underlying the Kuramoto model’s phase response. These topological features are then used in a gradient boosting classifier pipeline to sequentially predict whether a system change point will be reached. This approach is relevant to situations where we want to model the emergence of tipping points in complex systems that could lead to disastrous, irreversible effect.

In this article you will learn how to:

Generate time series from the Kuramoto system
Extract topological features from time-series
Set up a machine learning pipeline to predict the time series state using topological features
Evaluate the model’s change point prediction performance

Kuramoto Model

The Kuramoto model has been widely used to model the behavior of large sets of weakly coupled oscillators. It can describe patterns arising in systems of chemical oscillators, neuron spike trains, flame dynamics and firefly flashing motifs. A defining characteristic of the Kuramoto model is phase synchronization. Synchronization is an emergent phenomenon of complex network systems wherein a large set of nearly identical components begin to oscillate in phase. The onset of this synchronization can occur quite suddenly and can be measured by the instantaneous phase coherence. Is there a detectable region between the normal, uncoupled system behavior and the fully synchronized state? Figure 1 below is a plot of the time-series for a system of 50 nodes of a fully connected Erdos-Renyi network simulated for 1000 time-steps using the Kuramoto model equations. The time-series represent the sine of the phases of each node in the network.

Qualitatively one can see regions of no coherence (t < 200), partial coherence (200 < t < 400) and complete synchronization (t > 400). This suggests the transition from normal to partial (“onset” region) and full synchronization may be detectable from the underlying time series structure.

Change Point Detection and Topological Data Analysis

Change point detection tries to identify a specific point in time when an abrupt structural change in an underlying data source results in a detectable change in the behavior of the associated observable time-series. A survey of traditional methods applied to change point detection be found here. In this article we’ll apply the method of topological data analysis to the problem.

Topological data analysis as applied to time-series machine learning attempts to extract features that retain the underlying global temporal structure in the data despite the presence of noise, outliers, missing data and scaling variations. In algebraic topological parlance this is termed the ‘persistent homology’ of the data. In simplified terms, persistent homology quantifies the data’s structure by counting the number of connected components, 2D loops, or more generally d-dimensional “holes” which exists in the data. Check out this excellent introductory article for an intuitive explanation of the persistent homology concept.

Typically we associate synchronized behaviour with periodic regularity and non-synchronized behaviour with aperiodicity. Figure 2 illustrates the difference in persistent homology when periodic and aperiodic structure is present in a time-series. Clearly, the H1 persistence length (green dots distance away from dashed line) and the presence or absence of H2 (purple dots) are separable features that can be exploited in change point prediction.

Figure 2. Persistence diagrams for periodic (top) and aperiodic (bottom) systems. Note the difference in the green dot distribution (H1) and the presence of H2 (top) and absence (bottom). These features can be used to detect periodicity which can in turn be used to detect the onset of synchronization in our model (Image from giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration).

A data set’s persistent homology can be vectorized into a feature vector and used in machine learning pipelines in the familiar way by computing a set of persistence feature vectors on a sequence of sliding windows over the original time series. We’ll be using giotto-tda, a topological data analysis python library with seamless integration into scikit-learn’s supervised machine learning pipeline, to study the ability to detect change point onset in time-series generated by the Kuramoto dynamic model.

Problem Scope

In order to create a supervised data set for change point detection, we’ll need to create a set of labels. For each window of the time-series data in Figure 1 above we’ll compute a global order measure by averaging the phase coherence of each node with it’s neighbors. The next plot shows this coherence function for our 50 node network.

Figure 3. The coherence function of the Kuromoto model measures the phase synchrony of the network as a function of time (Image by Author).

Again we can see an onset region characterized by partial coherence region around t=200. To be precise, our task will be to predict the one-step-ahead state given the current time window of data. By quantizing the coherence function in Figure 3 into three classes (0, 1 and 2 for normal, onset, and fully synchronized states, respectively) we can create the target labels for our supervised classification problem. We do this in lines 14–16 of the windowing code (segment 2) below.

The Pipeline

The data processing flow is pretty straightforward.

Simulate data:

Create a 50 node, fully-connected Erdos-Renyi network.
Simulate the Kuramoto equations, resulting in the desired time-series data and one-step-ahead labels.

2. Now window the data and the labels using the giotto-tda sliding window transformer. This not only windows the data for a given window size and stride (amount to translate each window) but it resamples the labels so that a window of data, X_sw, corresponds to the last value, yr_one_step, in a corresponding window over labels .

3. Next, we extract the feature vectors to train and test our supervised machine learning algorithm. In order to compute the persistent homology of each windowed segment of the time-series, we initially use the Pearson dissimilarity transformer to compute a 2D dissimilarity matrix for each window and stack the results into a 3D object. Then for each 2D dissimilarity matrix the Vietoris-Rips persistence is computed up to homology 2. Finally the amplitude, persistent entropy and number of diagram points per homology dimension are calculated for each resulting persistence diagram resulting in a shape (195200 x 9)feature vector, X_tot . The details of each persistent diagram feature transformer calculation can be found here.

4. The final step in our pipeline is to train and test the model. We’ll use scikit learn’s Gradient Boosting Classifier although similar results are obtained with the SVM and Random Forest Classifier.

Results

The results indicate that onset (partial synchronization) and full synchronization change points are detectable using the method of topological data analysis. In Figures 4 and 5 below the true state is indicated by the background color sequence (magenta=normal, cyan=onset, yellow=synchronized). The time-series are color-mapped with the predicted, one-step-ahead classification value: green=normal, blue=onset and red=synchronized.

From the zoomed-in image below it is easier to quantify the detector’s performance. Clearly there is a late detection of the onset region (magenta), however, we detect slightly early the full synchronization regime (yellow).

Figure 4. Zoomed in plot of the topological change point detection results. We can see there is a late detection of the onset region, however, we detect slightly early the full synchronization regime (Image by Author).

The Final Takeaway

So how could topological change-point detection be useful for real world applications? Since we are predicting the one-step-ahead state of the system given a current time window, we can ostensibly estimate future change points in a diversity of signal sources such as climate, financial, mechanical or biological systems as long as we can measure dynamic time-series from the observable data. Accurately and reliably predicting future change points of a system allows us to potentially avoid the emergence of tipping points in complex systems that could lead to disastrous, irreversible consequences.

Here we’ve synthesized a simple network of interconnected nodes, however, because of its ability to extract hidden structural information using any time-series data, topological change-point detection could be applied to systems involving widely varying spatial and temporal scales, data types and node composition, including natural and man-made systems. Furthermore, the presented algorithm can be performed online in real-time since we do not require the entire time-series but only the current time window.