
El Clásico , Real Madrid vs FC Barcelona match, always gets us excited. Both teams are great, but both have different playing style. In Data Science world, we have our own El Clásico – PCA vs TSNE. Both are popular dimension reduction techniques. Both have their own style.
There is enough literature around PCA vs TSNE which explores the difference from a mathematical point of view or from visualisation point of view. In this story I would like present the difference using a data story telling approach. I will take two examples – one on Telco Customer dataset and second one on Marvel Avengers dataset. Excited. Then read on
Example 1 – Dimension reduction of a Telco dataset
Telecommunication companies collect a lot of information about their customers. The customer dataset is quite informative. One such sample dataset is shown here
The dataset has about 20 fields having various information such as customer demographics, services subscribed as well as billing charges. It is a mix of categorical and numeric fields.
Let us now apply the two techniques PCA and TSNE on this dataset and compare the results
Step 1 – Comparing the output
The results of dimension reduction to 2 dimensions using the two techniques is illustrated here

Looking at these visualisation, we can make some observations. PCA has formed two clusters. TSNE has formed three clusters and there are some data points which are separate from the clusters.
So just comparing the output of the two algorithms reveals some important differences. So what does this mean ? Let us answer this question in next step
Step 2 – Interpreting the clusters
In order to understand the difference, we will have to first interpret the meaning of the clusters. For interpretation purposes, given below is a name to each cluster. On PCA side, the names of clusters are Cluster0 and Cluster1. And from t-SNE side, the names of cluster are Cluster0, Cluster1 and Cluster2

A simple and quick way to interpret the cluster is using radar chart as shown here. The radar chart shows the average value of important features (or columns) for each cluster

We observe that
- For PCA, the feature Monthly Charges is biggest differentiator between clusters
- For TSNE, the feature Senior Citizen is the biggest differentiator between clusters
Let us verify these observations by putting some color. We can color the points in PCA visualisation using Monthly Charges. Similarly color the points in TSNE visualisation using Senior Citizen

This confirms what we have already observed using radar chart. On PCA side, a cluster for Low Monthly Charge is clearly identifiable. On TSNE side, a cluster of Senior Citizen is clearly identifiable
Step 3— Understanding the difference
So why PCA has chosen Monthly Charge as one of the key differentiating field? The reason is that PCA is a mathematical approach and tries to separate points as far as possible based on highest variance. Monthly charges is amount paid by customer and will vary from customer to customer
So PCA essentially works by separating points as far as possible based on highest varying field.
TSNE has chosen Senior Citizen as one of the key differentiating field. The reason is that TSNE is a probabilistic approach and tries to group points as close as possible based on probability that two close points came from the same population distribution. So generally, in real life also, Senior Citizen can be considered one type of population and have high probability that they exhibit similar behaviour of consuming telecommunication services
So TSNE essentially works by grouping points as close as possible based on characteristics of the point
Now let us explore another dataset which is based on Marvel characters and get more insights into the inner working of PCA and TSNE
Example 2— Dimension reduction of a Marvel Character dataset
Marvel has given us some great superhero movies. And fortunately it has also given an interesting dataset, so that the data scientist can play with. A sample from this dataset is shown here
This dataset has about 185 fields. Quite a lot of fields and you need some super weapon to make sense out of these massive number of fields. Fortunately we have super weapons, in terms of PCA and TSNE, at our disposal to do some cool dimension reduction
Shown here is output of the two dimension reduction techniques.

We can observe that there is one visible cluster in PCA, while there is no cluster formation in TSNE. The reason that TSNE does not have any cluster is because all super heroes are very different and there is no underlying population of similar super heroes. Superheroes are unique. There is no population of 100s of IronMan. There is only one IronMan.
So probability that one superhero is close to another one is very less. Speaking in data science terminology, the data is very sparse. TSNE would not form any clusters when data is very sparse
In order to interpret the PCA cluster, we can color the points by Power, which is the most varying field. For TSNE, as there is no cluster formation, so coloring the points makes no sense. The only way is to analyse TSNE output is to label some of the points with the marvel character name in order to make some interpretation

In PCA output, we see that cluster has characters with low and medium power. As we go further the Power of characters increases.
In TSNE, there are no clusters , but we can analyse closeness between some characters. We can see that Loki and Thanos are close. This is probably because they represent an Evil alignment in the dataset. Similarly Winter Soldier and Captain America are close. This is because they have similar background in army and similar characteristics. Same with Rocket Raccoon and Black Panther which have similar characteristics of animal traits and power
So based on this dataset, we can make one more important observation
TSNE will form cluster only when there are sufficient points in a population distribution (meaning when data is not sparse). However closeness of points can be used to make observations concerning similarity between two points
So my friends, that was a brief explanation about the two dimension reduction techniques using two data stories. To summarise
- PCA essentially works by separating points as far as possible
- TSNE essentially works by grouping points as close as possible
- TSNE will form cluster only when data is non-sparse
Both techniques have their own style and both can be useful in different situations
So now as you understand the differences, it is not one vs the other. Both are great techniques and both have their own style.

Additional resources
Website
You can visit my website to make analytics with zero coding. https://experiencedatascience.com
Please subscribe to stay informed whenever I release a new story.
You can also join Medium with my referral link.
Youtube channel Here is link to my YouTube channel https://www.youtube.com/c/DataScienceDemonstrated