Detecting stock market crashes with topological data analysis

Written by Wallyson Lemes De Oliveira, Lewis Tunstall, Umberto Lupo, and Anibal Medina-Mardones

Wallyson De Oliveira
Towards Data Science

--

As long as there will be financial markets, there will be financial crashes. Most suffer when the market takes a dip… those who can foresee it can protect their assets or take risky short positions to make a profit (a nevertheless stressful situation to be in, as depicted in the Big-short).

An asset on the market is associated to a dynamical system, whose price varies in function of the available information. The price of an asset on a financial market is determined by a wide range of information, and in the efficient market hypothesis, a simple change in the information will be immediately priced in.

The dynamics of financial systems are comparable to those of physical systems

In the same way that phase transitions occur between solids, liquids, and gases, we can discern a normal regime on the market from a chaotic one.

Observations show that financial crashes are preceded by a period of increased oscillation in asset prices [1]. This phenomenon translates into an abnormal change in the geometric structure of the time series.

In this post we use topological data analysis (TDA) to capture these geometric changes in the time series in order to produce a reliable detector for stock market crashes. The code implements the ideas given by Gidea and Katz, thanks to Giotto-TDA an open-source library for topological data analysis.

There is little consensus about the exact definition of a financial crash

Intuitively, stock market crashes are rapid drop in asset prices. The price drop is caused by massive selling of assets, which is an attempt to close positions before prices decrease even further.

The awareness of a large speculative bubble (like in the case of the sub-primes), or a catastrophic event, will cause markets to crash. In the last two decades we have seen two major crashes: the 2000 dot-com crash and the 2008 global financial crisis.

Our results in a nutshell

We analyse daily prices of the S&P 500 index from 1980 to the present day. The S&P is an index that is commonly used to benchmark the state of the financial market as it measures the stock performance of the 500 large-cap US companies .

Compared to a simple baseline, we find that topological signals tend to be robust to noise and hence less prone to produce false positives.

This highlights one of the key motivations behind TDA, namely that topology and geometry can provide a powerful method to abstract subtle structure in complex data.

Detection of stock market crashes from baseline (left) and topological (right) models, discussed in detail below.

Let’s describe in more detail the two approaches.

A simple baseline

Given that market crashes represent a sudden decline of stock prices, one simple approach to detect these changes involves tracking the first derivative of average price values over a rolling window. Indeed, in the figure below we can see that this naïve approach already captures the Black Monday crash (1987), the burst of the dot-com bubble (2000–2004), and the financial crisis (2007–2008).

Magnitude of the first derivative of mean close prices between successive windows.

By normalising this time series to take values in the [0,1] interval, we can apply a threshold to label points on our original time series where a crash occurred.

Crash probability for baseline model (left), with points above threshold shown on original time series (right).

Evidently this simple method is rather noisy.

With many points labelled as a crash, following this advice will result in over-panicking and selling your assets too soon. Let’s see if TDA can help us reduce the noise in the signal and obtain a more robust detector!

The TDA pipeline

The mathematics underlying TDA is deep and won’t be covered in this article — we suggest this overview. For our purposes, it is sufficient to think of TDA as a means to extract informative features which can be used for modeling downstream.

The pipeline we developed consists in: 2) embedding the time series into a point cloud and constructing sliding windows of point clouds, 3) building a filtration on each window to have an evolving structure encoding the geometrical shape of each window, 4) extracting the relevant features of those windows using persistence homology, 5) comparing each window by measuring the difference of those features from one window to the next, 6) constructing an indicator of crash based on this difference.

TDA pipeline

Time series as point clouds — Takens’ embedding

A typical starting point in a TDA pipeline is to generate a simplicial complex from a point cloud. Thus, the crucial question in time series applications is how to generate such point clouds? Discrete time series, like the ones we are considering, are typically visualised as scatter plots in two dimensions. This representation makes the local behaviour of the time series easy to track by scanning the plot from left to right. But it is often ineffective at conveying important effects which may be occurring over larger time scales.

One well-known set of techniques for capturing periodic behaviour comes from Fourier analysis. For instance, the discrete Fourier transform of a temporal window over the time series gives information on whether the signal in that window arises as the sum of a few simple periodic signals.

For our purposes we consider a different way of encoding a time-evolving process. It is based on the idea that some key properties of the dynamics can be unveiled effectively in higher dimensions. We begin by illustrating a way of representing a univariate time series as a point cloud, i.e. a set of vectors in a Euclidean space of arbitrary dimension.

The procedure works as follows: we pick two integers d and τ. For each time tᵢ ∈ (t₀, t₁, …), we collect the values of the variable y at d distinct times, evenly spaced by τ and starting at tᵢ, and present them as a vector with d entries, namely:

The result is a set of vectors in d-dimensional space! τ is called the time delay parameter, and d the embedding dimension.

This time-delay embedding technique is also called Takens’ embedding after Floris Takens, who demonstrated its significance with a celebrated theorem in the context of nonlinear dynamical systems.

Finally, applying this procedure separately on sliding windows over the full time series leads to a time series of point clouds (one per sliding window) with possibly interesting topologies. The GIF below shows how such a point cloud is generated in 2-dimensions.

Illustration of the Taken’s embedding with embedding dimension d=2 and time delay τ=1

From point clouds to persistence diagrams

Now that we know how to generate a time series of point clouds, what can we do with this information? Enter persistent homology, which looks for topological features in a simplicial complex that persist over some range of parameter values. Typically, a feature, such as a hole, will initially not be observed, then will appear, and after a range of values of the parameter will disappear again.

Point clouds from two successive windows and their associated persistence diagram

Distances between persistent diagrams

Given two windows and their corresponding persistence diagrams, we can calculate a variety of distance metrics. Here we compare two distances, one based on the notion of a persistence landscape, the other on Betti curves.

Magnitude of the landscape (left) and Betti curve (right) distances between successive windows.

From these figures we can infer that the metric based on landscape distance is less noisy than the Betti curves.

A topological indicator

Using the landscape distance between windows as our topological feature, it is a simple matter to normalise it as we did for the baseline model. Below we show the resulting detection of stock market crashes for the dot-com bubble and global financial crisis. Compared to our simple baseline, we can see that using topological features appears to reduce the noise in the signal of interest.

Crash probabilities and detections using topological features. The time ranges correspond to the dot-com bubble in 2000 (upper) and the global financial crisis in 2008 (lower).

Conclusion

Our results suggest that the periods of high volatility preceding a crash produce geometric signatures that can be more robustly detected using topological data analysis. However, these results concern only a specific market and for a short period of time, so one should further investigate the robustness of the procedure on different markets and varying thresholds. Nevertheless, the results are encouraging and open up some interesting ideas for future development.

More articles with code:

Graph-embeddings

Time series

Visualization

Persistence

Giotto

--

--

Co-founder & VP of AI at Giotto.ai | Graduate in mathematics at Swiss Federal Institute of Technology Lausanne.