The world’s leading publication for data science, AI, and ML professionals.

Unsupervised Machine Learning Explained

What is it, Approaches, and Applications

Photo by Jase Bloor on Unsplash
Photo by Jase Bloor on Unsplash

Unsupervised learning is a great solution when we want to discover the underlying structure of data. In contrast to supervised learning, we cannot apply unsupervised methods to classification or regression style problems. This is because unsupervised ML algorithms learn patterns from unlabeled data whereas, we need to know the input-output mappings to perform classification or regression (in most cases, I’ll touch on this later). Essentially, our unsupervised learning algorithm will find the hidden patterns or groupings within the data without the need for a human (or anybody) to label the data or intervene in any other way.

The Difference Between Classification and Regression in Machine Learning

Approaching Unsupervised Learning

This method of learning is typically leveraged when our data is unlabeled. For instance, if we wanted to determine what the target market will be for a new product that we want to release is, we would use unsupervised learning since we have no historical data of the demographics of the target market. There are three main tasks when performing unsupervised learning (in no particular order): 1) Clustering 2) Association Rules 3) Dimensionality reduction. Let’s delve deeper into each one.

Clustering

Theoretically speaking, instances within the same group would have similar properties and/or features. Clustering involves grouping unlabeled data based on their similarities and differences, therefore, when 2 instances appear in different groups we can infer that they have dissimilar properties and/or features.

This approach to unsupervised learning is quite popular and can be broken down further into different types of clustering such as exclusive clustering, overlapping clustering, hierarchical clustering, and probabilistic clustering – the details of each of these methods are beyond the scope of this article. A popular clustering algorithm is K-Means Clustering.

Association Rules

Association rule learning is a rule-based machine learning method for discovering interesting relationships between variables in a given dataset. The intention of the method is to identify strong rules discovered in data using a measure of interest. These methods are often used for market basket analysis which allows companies to gain a better understanding of the relationships between various products. There are a number of different algorithms used for association rule learning, but the most widely used is the Apriori algorithm.

Dimensionality Reduction

Another form of unsupervised learning is dimensionality reduction. This refers to the transformation of data from a high-dimensional space to a low-dimensional space such that the low dimensional space retains meaning properties of the original data. One reason we would reduce the dimensionality of data is to simplify the modeling problem since more input features can make the modeling task more challenging. This is known as the _curse of dimensionality_. Another reason we’d do this is to visualize our data since visualizing data in more than 3 dimensions is difficult.

Dimensionality reduction techniques are typically employed during the data preprocessing or explanatory data analysis (EDA) phase of the Machine Learning workflow. Popular algorithms include Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and Autoencoders.

Applications of Unsupervised Learning

It’s quite easy to see where supervised learning fits in with business when we talk of machine learning, and not quite so with unsupervised learning. One reason may be down to the fact that unsupervised learning naturally introduces a higher risk of inaccurate results since there’s nothing to measure the results derived from the algorithm against – businesses may not be willing to take this risk.

Nonetheless, it still provides a good exploratory path to look at data, which enables businesses to spot patterns within their data, faster than they would if they were to manually observe it.

Common applications of unsupervised learning in the real world include:

  • News Segmentation: Google News is known to leverage unsupervised learning to categorize articles based on the same story from various news outlets. For instance, the results of the Football (Soccer for my confused friends across the Atlantic) transfer window can all be categorized under Football.
  • Computer Vision: Visual Perception tasks such as object recognition leverage unsupervised learning.
  • Anomaly detection: Unsupervised learning is used to identify data points, events, and/or observations that deviate from a dataset’s normal behavior.
  • Customer Personas: Interesting buyer persona profiles can be created using unsupervised learning. This helps businesses to understand the common traits and purchasing habits of their customers, thus, enabling the business to align their products more accordingly.
  • Recommendation Engines: Past purchase behavior coupled with unsupervised learning can be used to help businesses discover data trends that could be used to develop effective cross-selling strategies.

Final Thoughts

Unsupervised learning is a great way to discover the underlying patterns of unlabeled data. These methods are typically quite useless for classification and regression problems, but there is a way we can use a hybrid of unsupervised learning and supervised learning. This method is called semi-supervised learning – I’ll touch on this deeper in another article.

Thanks for Reading!

If you enjoyed this article, connect with me by subscribing ** to my FRE**E weekly newsletter. Never miss a post I make about Artificial Intelligence, Data Science, and Freelancing.

Related Articles

The Best Resources To Learn Python For Machine Learning & Data Science

How To Get Paid To Learn Data Science & Machine Learning

The Best Machine Learning Blogs & Resources To Follow


Related Articles