Data Science Concepts
The Difference Between Supervised and Unsupervised Machine Learning
How they actually compare

Machine learning is able to handle multitudes of data. That data can come in many different forms from images to spreadsheets to text. This data can contain many different types of informations such as passwords, addresses, or even color patterns. When it comes to machine learning, there are two different approaches: unsupervised and supervised learning.
There is actually a big difference between the two different types of learning. The differences lead to unique computer Algorithms catered to each type of learning. For example, a classification machine learning algorithm such as one that is able to label an image as an apple or an orange, is reserved for use in supervised machine learning. A clustering algorithm, such as one that is able to group together books by their writing styles, is reserved for unsupervised machine learning.
We will be exploring the differences between the two types of machine learning and determining which form of learning is the most appropriate for a specific dataset.
Supervised Learning
Let’s say you are developing a Machine Learning model that is able to tell the difference between a good stock investment that will increase in price in the near future and a bad stock investment that will depreciate over the next month.

Traditionally, you would seek the advice from financial advisors who specialize in stock investments. These financial advisors were taught which company was worth investing in by older, experienced advisors. They were taught that select pieces of company information leads to an increase in stock value. Their performance in picking stocks was supervised by these experienced advisors.
Supervised machine learning works in a similar way. You are teaching the machine which stock is worth investing in by feeding the algorithm select pieces of company information and labeling that information with a sign of a good investment. That act of labeling the data is an indication of supervised learning. The specific type of machine learning algorithm used for this problem would be a classification algorithm.
A strong indicator of supervised learning is the dataset that is being used to train the machine learning algorithm. If that dataset is fully labeled or identified with the answer that you want the algorithm to come up with, then it’s a good chance that you are dealing with a supervised machine learning algorithm.

In the case of the stock picking machine learning model, the dataset it is probably dealing with is one that contains company financials and whether or not that information leads to a positive or negative price movement. If you want to see an example of a stock picking machine learning algorithm, then please check out the article below:
I Built a Machine Learning Model to Trade Stocks Like Warren Buffett (Part 1)
The article above delves into the development of a stock picking machine learning algorithm with a basis in supervised learning.
Unsupervised Learning
For unsupervised learning, let’s say you wanted to develop a dating app that groups together dating profiles in order to improve the dating process. However, you don’t know how you would group them together in the first place. Should it be grouped by user’s preferences or by their own characteristics? Maybe group them together by their religious or political views? But in the end you’re still not entirely confident in how you would form these groups of dating profiles.

This is where unsupervised machine learning comes in. If you are not sure what differentiates select pieces of data from another inside a large dataset, then you can utilize unsupervised learning. This is what unsupervised learning is used for – To find correlations and similarities among data that you don’t know you are looking for.
As you can probably tell, unlike supervised learning, unsupervised machine learning utilizes unlabeled data. Clustering is a popular form of unsupervised learning that examines pieces of data to find similarities and discrepancies to cluster or group them together. In regards to the development of the dating app, unsupervised learning would find the differences and relations among the dating profiles to make groups or clusters from these profiles.
If you would like to see clustering in action, then check out the following article where unsupervised machine learning is implemented to group together dating profiles:
Which Type of Learning is Best?
There is no superior form of learning between supervised and unsupervised learning. You just need to know when to use one or the other. So the use of either form learning is entirely dependent on the problem we are facing in the development of the machine learning model or the dataset we will be using for the model.
Depends on the Data
In regards to the dataset, the form of learning you will use is determined by the following:
- Labeled Dataset = Supervised Learning
- Unlabeled Dataset = Unsupervised Learning

If the dataset contains labels or tags, for example, a set of pictures of fruit with their corresponding name (apple, orange, banana, etc.) then we would implement supervised learning. A supervised machine learning algorithm would then learn which picture is associated with each label based on the picture’s content.
If the dataset does not contain any labels or tags, such as the fruit example but without their corresponding names, then we would implement unsupervised learning. An unsupervised machine learning algorithm would then find the differences and similarities between the pictures of fruit (color, shape, texture, etc.) to cluster them together.
Depends on the Problem
Sometimes the problem we are presented with determines which form of learning we will use. In the examples we used before (the stock picking model and the dating app), the issues we faced determined which type of learning we would use.
For the stock picking model, we would need to know the stock’s performance based on a company’s financial statements, whether the stock price went up or down. This ground truth leads us to use a supervised machine learning model.
For the dating app, we didn’t know how to differentiate the profiles. There is no ground truth to the clustering of the dating profiles. This would bring us to the usage of an unsupervised machine learning model.
Closing

The difference between unsupervised and supervised learning is pretty significant. A supervised machine learning model is told how it is suppose to work based on the labels or tags. An unsupervised machine learning model is told just to figure out how each piece of data is distinct or similar to one another.
The need to use one or the other is largely based on whether or not our data has labels or tags. It is also dependent on the problem we are facing and the problem usually influences what kind of data we are presented with.
In the end, there is no one superior form of learning between unsupervised and supervised. You just need to know when and where to apply them.