Shades of Machine Learning

Supervised vs Unsupervised Learning

Kriti Srivastava
Towards Data Science

--

In the last few blogs, we discussed various methods of cleaning and transforming data at different scales before applying any ML algorithm to it (you can find the links at the bottom of this article). This preprocessing of the data is required because dirty/messy data would not make any sense to the model and to us. So now that we have learned how to clean the data, let’s look at the main picture — the different ML techniques to solve the different types of problems around us.

Let’s start with the basic ML Techniques in the world!

Supervised vs Unsupervised Learning

Supervised Learning

As the name suggests, supervised learning is learning under some supervision. For example, what you learn in school is supervised learning because there are books and teachers who supervise you and guide you towards the end goal. Similarly in terms of machine learning, when the model is able to learn the “if this — then this” pattern, it is called supervised learning.

Supervised learning consists of Classification and Regression techniques because in both cases, the end goal is set. In the case of classification, one of the different classes/categories/targets are supposed to be predicted from a finite list. In regression, a continuous number is supposed to be predicted. But in both cases, historical data tells what should be predicted when a particular sequence of things happens.

For example, you want to decide if today you should go “out in the park” or “sit at home”? The first thing which will come to your mind is the weather — if it rains, you would “sit at home”, if it’s sunny, you’d definitely want to go out in the park because that’s the logical thing to do (unless you are a couch potato and just want to sit inside anyway and finish that series!).

But if you ask a computer the same question, it doesn’t understand human logic. So it needs some kind of supervision or a set of rules to give an answer, which is what the historical data will do. The historical data will help the computer understand patterns like, in the past, you’ve stayed indoors 8/10 times when it rains. So it’ll develop a rule in its memory that — when it rains -> stay at home. And the next time you ask the computer this question, it’ll ask you to stay indoors when it rains.

Easy right? This is how we learned daily stuff as kids by remembering what our adults told us, or just watching what they did. Supervised learning is pretty friendly in that matter. It wants us to succeed, and it is ready to help us in whatever way possible. We just need to find the right datasets.

Unsupervised Learning

Unsupervised Learning is not so friendly! It doesn’t supervise the model in the ML journey, but that’s the thing, unsupervised learning helps you get more and more curious. It’ll help you explore patterns and groups you would’ve not thought about before. The data is not labeled before, the target is not known for unsupervised learning data.

For example, you have to divide the people in your office into different groups for Holiday Activity, and instead of randomly dividing everyone into groups, you come up with some features like height, weight, department, designation, gender, the floor they sit on, team, etc. And based on these features, you make several groups. These different groups are called clusters, and the features of people in each cluster would be different from those of the other cluster’s people, but very similar to the people in the same cluster. So in this case, instead of coming up with the number of groups beforehand, we let the distinguishing features of people decide the group they want to go in or create a new group if their features don’t exist in any previous cluster.

Image from Introduction to Machine Learning

Now that we have understood the difference between supervised and unsupervised learning, the image below explains some basic ML techniques in these categories and their distinct features.

Image by Author

In the next blog, we will try to understand some techniques of supervised learning in-depth. Meanwhile, you can check my previous blogs for the data preprocessing and ML project flow:

  1. Data Science for Non-Data Scientists
  2. Bridging the Gap between Business & Data Science
  3. Data Science — Where do I start?
  4. What’s inside the data!
  5. Understand the Patterns in the Data
  6. Feature Engineering — What to Keep and What to remove?

I hope this blog helps someone understand the things they were not able to get earlier (and keep them questions coming)! :)

--

--