The shape that survives the noise

Thomas Boys
Towards Data Science
7 min readNov 27, 2019

--

Written by Colin Kälin, Philipp Weiler, Marta D’Angelo, Philippe Nguyen, Benjamin Russell and Thomas Boys.

https://github.com/giotto-ai/noise-to-signal

Because dealing with noise comes before any kind of model fitting, you want as transparent a way to extract a robust signal from a noisy one.

Removing noise is a Sisyphean task and quick fixes can jeopardise an entire analysis.

Sisyphus is forced to roll an immense boulder up a hill only for it to roll down when it nears the top, repeating this action for eternity.

TDA removes noise in a way that extracts the fundamental shape of the data.

Typical methods include filtering (Kalman filter …) and smoothing strategies (exponential moving averaging…). These methods all focus on time averaging methods and operate under the assumption that the noise frequency is far higher than the signal frequency you wish to extract. In contrast, TDA is not primarily based on simple temporal averaging, it rather extracts structural information from each fixed time.

An application to recovering hidden messages in noisy signals

We consider the problem of predicting regime changes under noise. To get an idea of which feature sets are the best to predict regime changes we build four models to perform a binary classification task. Each model is built using a different set of features: two sets of features without TDA, one using only TDA features, and one with all the combined features.

Noise, when good theory meets dirty practices. It is that unwanted party in your signal.

Our results in a nutshell

TDA features are highly predictive for rapidly detecting a subtle change in the key characteristics of a noisy signal.

The features outperform standard features. They also contain performance boosting information not available in the standard methods.

Below, we benchmark rapid regime-detection in a complex digital circuit system using TDA features against standard feature creation strategies.

We found that, in the high noise regime, TDA features yielded a significant performance boost over standard feature strategies. TDA not only outperforms the standard strategies alone, it provides a clear performance boost on top of standard strategies when the two are combined. Thus it clearly contains predictive information disjoint from the information present in the standard features and can thus augment prediction.

Detection accuracy as noise is added to data

Problem statement

Ali and Julie want to share a hidden message to organise their secret holiday shopping at the Pound store (Booyakasha: his code is available).

Ali explaining his message because the signal was too noisy.

Hiding in plain sight: steganography is the art of concealing one signal within another.

Ali cunningly hides the location, date and hour of their secret meeting as a sequence of 1’s and 0’s represented by the orange signal, a bit like Morse code. To hide the signal he cranks it through the Duffing oscillator, which outputs the blue signal with the message hidden inside.

Message in orange and noisy carrier signal by Duffing oscillator in blue.

To hide the message, Ali varied the “A” parameters of the Duffing oscillator according to the digits of his message (the orange signal). This change switches the output signal between two regimes of the Duffing oscillator which have different topological properties. Importantly, these two regimes cannot be distinguished by eye. This makes them suitable for hiding messages in plain sight.

Duffing oscillator circuit, electronic circuits are great source of data to study periodic and chaotic signals. Figure by Elena Tamaševičiūtė et al.

Its chaotic output allows the Duffing oscillator to be used in signal extraction and detection tasks. In the field of signal processing, especially real-time signal processing, the usage of specific circuits is common for all manner of filtering tasks, detections of anomalies and signal extraction.

Reconstructing the message

Ultimately Julie needs to reconstruct the message (orange signal) from the “regime changes” in the blue signal. She uses an ML classifier to extract the message by recognising the two regimes, but will succeed only if she has the right feature creation steps.

To train her decoder, Julie measures the voltage output of a Duffing circuit (which she splits into train/test 140'000 and 60'000 points respectively). Each measurement is equipped with a label indicating to which region it belongs.

Feature creation

The intrinsic features are the direct measurement of the voltage of the output signal and its time derivative.

The standard times series features are created from a number of rolling windows of different sizes. The features that we create within a rolling window are mean, maximum, minimum, moving average difference (MAD) and the first few Fourier coefficients.

TDA Features

Our goal is to see a topologically different point cloud depending on the regime of the time series (and thus to extract the message). The TDA features were created using the Giotto library.

In order to create the TDA features, we embed our time-series into a higher dimensional space using the Takens’ embedding (explained here). Each step of the rolling window is converted into a single vector in higher-dimensional space (the dimension of which is the size of the window).

Takens embedding in dimension two.

Topology studies the connectivity within each point cloud. We apply the Vietoris-Rips algorithm which connects points under a certain distance threshold by an edge, and by varying this threshold, V-R creates an approximation of the point cloud at different levels of granularity.

Vietoris-Rips algorithm.

The following features were found to be effective predictors.

Total number of holes feature

The information about the holes is contained in the persistence diagram. It is the result of progressively connecting points of larger and larger distance (epsilon) apart, and we use it to create the Betti surface. The Betti surface counts the number of holes present in the data as a function of epsilon and time.

Betti surface showing the 1 and 0 regimes in the Duffing oscillator output.

Relevant holes feature

The relevant holes feature counts the number of holes over a given threshold size (more than 70% of the maximum value).

Amplitude of the diagram feature

We use the diagram norm as measure of the total persistence of all the holes in the diagram.

Mean support feature

The mean of the epsilon distances yielding non-zero Betti values in the Betti surface.

ArgMax feature

The argmax feature is the value of epsilon for which the Betti number was highest for each time window.

Average lifetime feature

For each dimension we take the average lifetime of a hole in the persistence diagram (=Betti surface at a fixed time).

Pipeline

Steps of the feature creation and model fit pipeline.

Illustration of features

We demonstrate the features created in the pipeline on the noised Duffing oscillator with Ali’s message hidden within. We clearly see the TDA features identifying the 1’s and 0’s regimes in this short period of time:

Topological features output from the Jupyter notebook.

Results

We can clearly identify two qualitatively different regions.

Even in the case of the noisy regime the Betti surface is clearly differentiating the 1 and 0 regions of the signal.

The Betti number drops significantly faster (as epsilon increases) in the 0 region of the hidden signal, than in the 1 region.

The gradient boosting classifier output is processed as follows: first, we apply a rolling window and calculate the mean within it. Next, we label all values above a predefined threshold as 1 and 0 otherwise.

For the sake of comparison we benchmark TDA features by independently training four separate times using:

  • Only the times series
  • A standard set of times series features
  • TDA features
  • All combined

To test robustness to noise we train each configuration on a distorted version of the dataset (by adding Gaussian noise of increasing volumes). Here is the performance plot in function of noise:

Accuracy in function of noise.

We see that the best performance is achieved when we use all features. Thus, adding the TDA features increases the overall performance and makes the classifier more resistant to noise.

Both TDA and standard strategies perform perfectly before noise is added to the signal. However, and this is the kicker, when we added noise, TDA outperformed standard strategies, and performed yet better still when the two methodologies were combined.

This shows that not only are the TDA features highly predictive on their own, but that they also contain performance boosting information not available in the standard methods.

Have fun exploring your own TDA applications for noise resilient signal processing.

Links:

--

--