The world’s leading publication for data science, AI, and ML professionals.

Time Series Clustering and Dimensionality Reduction

Cluster sensor data with Kolmogorv Smirnov Statistic and Machine Learning

Photo by Igor Ferreira on Unsplash
Photo by Igor Ferreira on Unsplash

Time Series must be handled with care by data scientists. This kind of data contains intrinsic information about temporal dependency. it’s our work to extract these golden resources, where it is possible and useful, in order to help our model to perform the best.

With Time Series I see confusion when we face a problem of dimensionality reduction or Clustering. We are used to think about these tasks in more classical domains, while they remain a tabù when we deal with Time Series.

In this post, I try to clarify these topics developing an interesting solution where I work with multidimensional Series coming from different individuals. Our purpose is to cluster them in an unsupervised way making use of deep learning, being wary of correlations, and pointing a useful technique that every data scientist must know!

THE DATASET

I got the data from UCI Machine Learning repository; I selected the Public Dataset of Accelerometer Data for Human Motion Primitives Detection. These data are a public collection of labeled accelerometer data recordings to be used for the creation and validation of acceleration models of human motion primitives.

Different types of activities are tracked, i.e. drinking, eating, climbing and so on. For a particular activity of a specific individual measured, we have 3 different sensor series at disposal: X-axis (pointing toward the hand), Y-axis (pointing toward the left), Z-axis (perpendicular to the plane of the hand).

I figure myself in this situation because it allows to carry out our initial problems of clustering (multiple individuals) and Dimensionality Reduction (multiple series for every individual) all in one single case.

Below I plot 2 examples of data at our disposal coming from a male and female individuals. In total, we have 20 individuals with the same measurement length.

DIMENSIONALITY REDUCTION

Firstly, our attack plan provides to resolve the problem of multidimensionality. We want to summarize all the information stored in sensor data all in one significative series. This latest step will enable us to easy cluster our individuals in groups.

There is a great variety of techniques which enable to reduce dimensionality in data, but our attention focused on Deep Learning algorithms. A neural network structure will permit us to handle easily our initial data: I remember you that we have 20 individuals and for every individual, we have 3 positional series of movements of length 170 (pythonic speaking we have an array of dimensions 20x170x3). The classical PCA based methods don’t permit us to carry out this kind of problem, so we built our handmade Autoencoder in Keras which takes care of our infamous original data structure.

inp = Input(shape=(data.shape[1], data.shape[2]))

encoder = TimeDistributed(Dense(200, activation='tanh'))(inp)
encoder = TimeDistributed(Dense(50, activation='tanh'))(encoder)
latent = TimeDistributed(Dense(10, activation='tanh'))(encoder)
decoder = TimeDistributed(Dense(50, activation='tanh'))(latent)
decoder = TimeDistributed(Dense(200, activation='tanh'))(decoder)
out = TimeDistributed(Dense(3))(decoder)
autoencoder = Model(inputs=inp, outputs=out)
autoencoder.compile(optimizer='adam', loss='mse')

Above I’ve shown the architecture I’ve used: the TimeDistributed layer permits to deal with 3D data where the dimension of the index one will be considered to be the temporal dimension. For our experiment, we use the first 10 individuals to train our Autoencoder and utilize the rest to compute error reconstructions with the relative predictions. Don’t forget to standardize your data before to feed your Autoencoder! In our case, I’ve standardized the data for every person by single observation (by rows).

Our final reconstruction errors look like similar to this one below, where we have points near zeros when we are confident and we are able the detect the activity of this selected person; while we have high value when our model doesn’t learn enough and it has not so much confidence to reconstruct the walking activity.

CORRELATION CLUSTERING

At this point, we have at disposal manageable objects (with dimension 175×1 for 20 individuals) and we are ready to proceed with clustering. Technically speaking, we operate hierarchical clustering on reconstruction errors of our test individuals. In order to catch important relationships among these series, we try two different instruments to assemble our clusters.

At the first stage, our choice involved the adoption of the Pearson Correlation index. Unfortunately, this measure is very OVERSTIMATED and ABUSED in statistics and Machine Learning fields, but we want to give it a chance…

After obtaining the correlation matrix, we operate directly on it performing the hierarchical clustering. We apply a high threshold (99% of the highest pairwise distance between sensor series) to form our flat clusters. This will result in the creation of high-level groups, small in number, but which give us a first impressive overview of our test data.

d = sch.distance.pdist(corr)
L = sch.linkage(d, method='ward')
ind = sch.fcluster(L, d.max(), 'distance')
dendrogram = sch.dendrogram(L, no_plot=True)
df = [df[i] for i in dendrogram['leaves']]
labels = [person_id[10:][i] for i in dendrogram['leaves']]
corr = np.corrcoef(df)
dendrogram = sch.dendrogram(L, labels=[person_id[10:][i] for i in dendrogram['leaves']])
Hierarchical Clustering on Correlation Matrix
Hierarchical Clustering on Correlation Matrix

Looking at color intensities of the correlation matrix (on which we’ve just operated clustering procedures), we can’t see obvious group patterns. On the right, the cut (black line) of the dendrogram, after some initial ‘uncertainty’, doesn’t create rational groups. Male and female are mixed together without logic!

The Pearson Correlation index confirms again its unreliability, we have to go another way…

KOLMOGOROV SMIRNOV CLUSTERING

Recently, I’ve read about Kolmogorov Smirnov statistic and this has produced a double effect to me: it has reminded me of the university and it has stolen my attention for its adaptability. This statistic, with the relative p-value, is used to measure the difference in distribution among two samples. I think that our clustering task is a good field of application for this killer instrument.

Compute this statistic is very easy with python and, in order to use it in our case, we only have to create the Kolmogorov Smirnov Matrix (equivalent to the Correlation Matrix) and reproduce the same steps we’ve done above.

Hierarchical Clustering on Kolmogorov Smirnov Matrix
Hierarchical Clustering on Kolmogorov Smirnov Matrix

Now, looking at the color intensity of our matrix (on which we’ve just operated clustering procedures), we can observe the presence of a pattern among females and males. As clearly visible in the dendrogram on the right, our hierarchical procedure has created two sensible groups, where men are all separated from women. The ‘uncertainty’, at the beginning of clusters building, is also disappeared.

This is the result we want, which confirms the importance of the presence of Kolmogorov Smirnov statistic in the arsenal of every data scientist.

SUMMARY

In this post, we’ve solved simultaneously a problem of dimensionality reduction and clustering for time series data. We’ve utilized an Autoencoder to summarize (in form of reconstruction errors) the relevant characteristics of the accelerometers. With our one-dimensional series, we’ve carried out a clustering partition among individuals. The most satisfying results come from the combination of Kolmogorov Smirnov statistic and hierarchical clustering, confirming again that Pearson Correlation must be handled with caution.


CHECK MY GITHUB REPO

Keep in touch: Linkedin


Related Articles