
Clustering is one of the most popular techniques in Data Science. Compared to other techniques it is quite easy to understand and apply. However, since clustering is an unsupervised method, it is challenging for you to identify distinct clusters that are comprehensible to your business clients.
Goal
This article provides you visualization best practices for your next clustering project. You will learn best practices for analyzing and diagnosing your clustering output, visualizing your clusters properly with PaCMAP dimension reduction, and presenting your cluster’s characteristics. Each visualization comes with its code snippet. You can use this article as a reference guide.
Since my last article about clustering already covered some technical details and explanations, I will keep explanations short here.
Cluster selection and diagnoses
Let’s start at the very beginning. Before you analyze any cluster characteristics you have to prepare your data and select a proper clustering algorithm. For the sake of simplicity we will be working with the commonly known wine data set and use a K-Means model. Nevertheless, most of the visualizations shown in this article can be used for any clustering algorithm.
The code above loads the wine data set and uses the StandardScaler to scale the whole data set.
To ensure that the later visualizations of our clusters always use the right and same colors, we define a list of six different colors (figure 1).

Determine the right number of k clusters
There are several methods of how to determine (visually) the right number of clusters. In the following we will use the elbow-plot method, the (mean) silhouette score method and the silhouette analysis.
Elbow method
To get a comprehensive and proper visualization of the elbow-plot, I recommend using the yellowbrick package pip install yellowbrick
. The following code will produce the plot shown in figure 2.
The output also plots a recommendation (dashed line) which k you should choose. In case it can not determine a proper number, it will show a warning.

Silhouette score
Another way to determine the number of clusters is the silhouette score method. The code below plots the output in figure 3.
The derived result (3) is equal to the one from the elbow plot method.

Silhouette analysis
Last but not least, we can use the silhouette analysis method to determine the optimal number of clusters. The idea and methodology are pretty well explained in this sklearn article.
The provided code in the mentioned article plots one silhouette chart per row. However, that can be very unclear when you have a large number of clusters and want to compare their related silhouette charts. Therefore, I wrote the code below to plot three charts per row, which makes the later comparison (figure 4) much clearer.

After trying out several ways to visually determine the right number of k clusters, we decide to continue with k=3 and build our clusters.
Cluster diagnoses
A next step is to diagnose our clusters in terms of their magnitude and cardinality.
If you are not familiar with these terms, check out my article.
To create the following plots (figure 5) we will use the data-science-utils package, which can be installed with pip install data-science-utils
.

Cluster visualization
To visualize our clusters in a 2D space, we need to use dimension reduction techniques. A lot of articles and textbooks work with PCA. Recent blog posts also recommend methods like t-SNE or UMAP. However, there are pitfalls and misunderstandings.
To keep it short: There is a trade-off between preserving local and preserving global structure when using these dimension reduction methods. While PCA preserves global structure, it does not preserve neighborhoods or local structure. On the other hand, t-SNE and UMAP preserve local structure but not the global one.
However, there is a relatively new technique that claims to preserve local and global structure: PaCMAP.
PCA and PaCMAP will be used in the following to visualize our clusters in a 2D space.
If you want to learn more about the different characteristics and PaCMAP, check out Why you should not rely on t-SNE, UMAP or TriMAP by Mathias Gruber.
After running the code you should get the following plot (figure 6):

Cluster characteristics
Let us focus now on how to visualize and present the key characteristics of each cluster so that a business person can easily understand what each cluster stands for.
Before we do that, we have to enrich our standardized (X_std) and non-standardized (X) data with a cluster column.
Boxplots
A first and very simple approach is to generate one boxplot for each feature to show it’s distribution per cluster.
To plot the outcome below (figure 7), we use the non-standardized data X. Using the plotted results of the standardized one (X_std) would be harder to interpret for business users since its scale and units have changed.

Data preparation
Before we continue, we have to prepare our data for the following visualizations. The following code helps us to better compare our clusters with each other.
First, we calculate the mean for each feature per cluster (_Xmean, _X_stdmean), which is quite similar to the boxplots above.
Second, we calculate the relative differences (in %) of each feature per cluster to the overall average (cluster-independent) per feature (_X_devrel, _X_std_devrel). This helps the reader to see how large the differences in each cluster are compared to the overall average of each feature.
Figure 8 illustrates an example of X how what our data looks like after the preparation steps.

Now that we have our data in the right shape, we can continue with our visualizations.
Bar plots
To visualize the relative differences we can use bar plots. The following code plots the differences per cluster for each feature.
The outcome is shown in figure 9 below.

The plots above are great if you want to show the very details of each cluster. However, in many cases it also makes sense to summarize all the relevant results and characteristics in one chart. The solution below is one way to do that.
We visualize in figure 10 the relative deviation of each feature from its overall average per cluster.

Radar chart
Another way of summarizing all the relevant information in one plot is to use a radar chart. The code below plots the calculated means of our standardized data (X_std_mean).
If we use the non-standardized version, the different scales would crash the visualization (e.g., the mean of proline is much higher than the one for ash). Therefore, I recommend plotting values with the same unit or at least within similar value ranges. The final outcome is shown in figure 11.

Conclusion
The goal of this article was to provide you with best practices in cluster diagnoses, visualization, and explanation. Consider PaCMAP when plotting your clusters in a 2D space. The cluster results or characteristics can be presented from different viewpoints. One idea is to show the mean value of each feature per cluster. Another option is to calculate the relative differences of each variable per cluster to the overall mean per feature. When presenting your results to the business, it’s better use one plot (e.g., the shown radar chart or second bar chart). You can use the multiple plots in case you want to investigate the characteristics of each feature per cluster (e.g., for deep-dive sessions with the UX designers).
Sources
UCI Machine Learning Repository: Wine Data Set. " Creative Commons Attribution 4.0 International (CC BY 4.0) license".
Yingfan Wang, Haiyang Huang, Cynthia Rudin, Yaron Shaposhnik, Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization (2020), https://arxiv.org/abs/2012.04456