The world’s leading publication for data science, AI, and ML professionals.

Understanding t-SNE in Python

Grouping data looking at neighbors and using T-Distribution SNE

Photo by Clark Van Der Beken on Unsplash
Photo by Clark Van Der Beken on Unsplash

T-Stochastic Neighbor Embedding

The t-SNE algorithm is a good resource to look at high dimensional data. Think about it: to visualize more than 2 dimensional data can become really challenging. And I mean more than 2 because even the great 3D graphics available these days, it is still very difficult for our brain to interpret 3D images. After all, our monitor or screens are still flat good old 2D, right?

So we came up with this good resource. The t-distribution Stochastic Neighbor Embedding. What does this fancy name mean?

It means that the algorithm will look at the dataset checking for similarities between data points and converts them to joint probabilities (likelihood of two events occurring together and at the same point in time). Then, it will try to minimize the Kullback-Leibler divergence (difference between two probability distributions) between the joint probabilities of the low-dimensional embedding and the high-dimensional data.

Check the data points, calculate the probability of the events occur at the same time for the low dimensional and high dimensional data.

Perplexity

If we look at the documentation, perplexity is "related to the number of nearest neighbors that is used in other manifold learning algorithms". It also says that "larger datasets usually require a larger perplexity".

This is a tunable parameter. The parameter can be understood as an estimate about the number of close neighbors each point has. I also saw it here as one of the parameters to calculate the standard deviations of the T-Distributions. As per the documentation, consider selecting a value between 5 and 50.

Also consider creating a test with different values, since different values can result in significantly different results.

Look at this test with a fake dataset I created for learning purposes:

Fig1: t-SNE with different values for perplexity. Image by the author.
Fig1: t-SNE with different values for perplexity. Image by the author.

If we force the quantity of iterations n_iter=250 then the result below occurs. Look that we don’t see a clear separation for high perplexity in this case, because the algorithm needs a different number of iterations to converge to the best solution.

Fig2: t-SNE with different values for perplexity with n_iter=250. Image by the author.
Fig2: t-SNE with different values for perplexity with n_iter=250. Image by the author.

As my dataset is small, you can see that the smaller numbers are driving better results, given that the number of neighbors is smaller.

If I create another dataset with 50 variables distributed uniformly and present it to t-SNE, look at the result at different perplexity numbers.

Fig3: High perplexity >50 are not so effective. Image by the author.
Fig3: High perplexity >50 are not so effective. Image by the author.

In this case, the perplexity parameter after 30 start to get very random. As we see in the documentation, it should not be used over 50, as the results won’t be so precise.

Other Considerations

Other considerations we can consider from this good article How to Use t-SNE Effectively are:

  • Cluster sizes in a t-SNE plot mean nothing: t-SNE won’t care about the sizes of each cluster.
  • Distances between clusters might not mean anything. Not always the distances between clusters will be reflected by t-SNE. Have that in mind.
  • Random noise doesn’t always look random: if you create a random dataset and present it to t-SNE, you can still see some pattern for low value perplexity. That does not mean that there are clusters.
  • You can see some shapes, sometimes: the perplexity will dial the local variance for small values or global variance for high values. This leads to the appearance of shapes.

The results of t-SNE can perhaps be used for clustering, since the resultant matrix will be a low dimensional version of the data, thus the results shall be similar. You can see in the test at the code provided at the end of this article that KMeans presented the same results using the matrix or the one hot encoded high dimensional data.

Same results with KMeans on High or Low Dimensional dataset. Image by the author.
Same results with KMeans on High or Low Dimensional dataset. Image by the author.

Before You Go

Here it was presented the t-SNE. Using T-distributions, this nice algorithm helps us to visualize high dimensional data in 2D.

You must find the fine tune of the perplexity and n_iter hyperparameters to be able to get the best separation and similarities for your data. Try using loop and plot for best results.

The dataset used for this exercise can be found in my GitHub.

References

How to Use t-SNE Effectively

sklearn.manifold.TSNE

Understanding t-SNE by Implementation

t-SNE: The effect of various perplexity values on the shape

If this content is useful, follow my blog for more.

Gustavo Santos – Medium

Consider registering to Medium membership using this referral code.

Code:


Related Articles