
"Birds of a feather flocks together"
This is a well-known proverb that we’ve all heard since we were young. This suggests that individuals of the same type or with similar traits tend to be closer to each other.
In data science, this natural phenomenon is referred to as clustering, and it is used to find groupings among data by discovering several groups in which objects are substantially similar to those in the same group when compared with objects in the other groups.
Applications
Clustering has a variety of capabilities, including the ability to generate insights that are not evident to the naked eye. For example, one well-known application is crime hotspot identification, in which the offender can be located and/or more offenses prevented via clustering. On the other hand, it can be highly valuable in situations where it can assist businesses with tasks such as target marketing by analyzing consumers with similar traits.
It has wide variety of other applications in data mining, social network analysis, online cluster engines, biological data analysis, image processing, and so on too. As you can see, there are applications in practically every area, and understanding this is quite valuable.
Categories of clustering algorithms
You might think grouping objects is easy. But have you tried it in a methodical way so that even a machine could do it.
From years ago, researchers have come up with so many Algorithms to achieve this and there are few very frequently used algorithms such as K Means, agglormative clustering etc. Knowledge on these is very valuable for a data science learner, a practitioner or anyone whose in the field. This is because each algorithm has their own advantages and disadvantages, therefore a thorough grasp of them is necessary when choosing the right clustering method for a specific task.
The focus of the article is to have an overview of the main categories of clustering algorithms. Most well-known algortihms can be divided into three categories.
- Partitional clustering
- Hierarchical clustering
- Density-based Clustering
Partitional clustering
When we hear the name partition, what pops into our mind is dividing a space into sections, like interior walls in our house. What partiotional clustering does is exactly the same. It creates number of groups which the user specify by grouping closer items together. How it does it methodically is,
Choose k different cluster centers and begin attracting data points that are closest to the cluster.
In these techniques, each data point will be assigned to the kth cluster based on a set of criteria, genreally it aims to improve a similarity function so that distance becomes a significant element. The procedure will continue iteratively until iterations are stabilized, which means it will only end when the samples in one cluster have not passed to another cluster.

A major characteristic of these approaches is that they require the user to define the number of clusters, which is denoted by the variable k. K-means, k-medoids, and k mode are three examples of partitional clustering techniques.
Strengths – Simple, effective, scalable and easy to deploy
- Computes all attainable clusters synchronously.
- They work well when clusters have a spherical shape.
Weaknesses – Need of defining number of clusters at the beginning.
- It tries to make spherical shape clusters and not suited for complex shapes
- It tries to have clusters with same size.
- They are nondeterministic, thus completely different arrangements can arise from small changes in the initial random choice of cluster centers.
Hierarchical Clustering
In hierarchical clustering grouping will be done after arranging them in a hierarchy. There are two main ways to do this:
- Bottom up approach is ** called Agglomerative clustering**.
Consider each observation as separate cluster and then start merging pair of closest clusters iteratively.
In layman’s terms, this is similar to how a community is established. Each pair of closest individuals gets together first. Then, each pair is combined with another pair that has most similar traits, this can theoretically happen iteratively till all individuals are in one group.
- Top Down approach is called Divisive clustering
Consider all data as one cluster and then start splitting the least similar clusters at each step until only single data points remain.
Similar to the above example, this can be explained by how a division happens in a community. First all the individuals are in one community. Then if two groups have different beliefs the community will split into two groups forming two sub communities. This can theoretically happen iteratively till one community has only one individual at last.

The traditional visualisation of cluster hierarchy is a dendrogram. A dendrogram is a tree-based hierarchy of points produced by these approaches.
Similar to partitional clustering, the number of clusters (k) in hierarchical clustering is frequently chosen by the user. Clusters are created by cutting the dendrogram at a certain depth, resulting in k groupings of smaller dendrograms.
**Strengths
- deterministic** process, meaning when you execute an algorithm twice on the same input data, the cluster assignments will not change.
- do not have to define final number of clusters at the beginning
- higher interpretability with dendrogram.
- deeper insight into the relationships and good for small datasets
Weaknesses – Computationally expensive.
- Can be quite sensitive to outliers
Density-based methods
Here the grouping is done on the density of data points in a region. It recognizes locations where points are densely packed and are separated by empty or sparse areas. Basically, it
Considers data packed closely together – high density areas as cluster

Unlike previous clustering methods, this one does not require the user to provide the number of clusters. Instead, a distance-based parameter is used to set a variable threshold. This parameter determines how close points must be to be included in the cluster. Unlike other methods, density based clustering is robust to outliers. Here, it is not necessary to cluster each and every point to form a cluster.
Examples of commonly used density-based clustering algorithms include DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and OPTICS (Ordering Points To Identify the Clustering Structure).
Strengths – Better at identifying arbitrary shaped clusters.
- Resistant to outliers.
- No need to define the number of clusters at the beginning
Weaknesses – Hard to identify clusters with varying densities
- Hard to deal with high dimensional spaces
Conclusion
Some of the most commonly used clustering algorithm categories are those mentioned above. There are, however, many more, such as model-based, grid-based, genetic-based, and so on. In practice, no precise clustering methodology can be chosen ahead of time because the underlying mathematical approaches frequently fail to produce realistically interpretable results.
However, there are preferable techniques in many circumstances, as noted in the literature, and if one digs deep enough, many different variations od algorithms can be uncovered. In general, partitional algorithms are claimed to be better for large data sets because they are scalable, whereas hierarchical methods are said to be better for smaller data sets because they are more adaptable. Density-based clustering, on the other hand, is more robust when there is noise and outliers.