Machine Learning Simplified

An overview for business users

Published in

Towards Data Science

7 min readJun 6, 2020

As we discussed previously, Machine Learning refers to algorithms that are used to identify patterns within data. But what exactly do we mean by “patterns”, what all can we do with ML, and what is all this jargon about “models” and “training” them. In this article, I’ll try to explain all this without getting too technical, and what you, as a business-user, should know about Machine Learning.

Starting with what are the different types of ML use-cases:

Supervised Learning

Supervised Learning implies use-cases where we have a target we’re trying to predict given the data. For example,

Credit Card Companies trying to figure out the credit limit given the customer’s profile and credit history
An Automobile manufacturing company estimating tractor sales using the weather and macroeconomic data
A digital media company predicting if a customer will churn using his/her activity on the platform or
Gmail predicting the category of a mail from its body, subject and sender.

Supervised algorithms enable us to predict the target (for example the estimated credit limit, tractor sales, if the customer will churn, or the mail category) using the input data (customer’s credit history, weather and macroeconomic conditions, customer’s activity on the platform, mail specifications).

There are 2 kinds of supervised algorithms

These are differentiated based on the type of target variable:

If the target variable is a number (for example credit limit or tractor sales), the problem is known as a Regression problem.
If the target is a category (will the person churn — Yes or No; folder where the mail belongs — Primary, updates, promotions, forum, or spam), the problem is known as a Classification problem.

There are models both for Regression and Classification problems, i.e. algorithms which can solve these types of problems. But what do they do to solve them, and what exactly does the machine “learn”. Let’s try to explore using the example of our beloved Linear Regression

Sample Use-Case

Assume as an Automobile Manufacturing company, we want to predict the Tractor sales and based on our domain expertise, we know that the rainfall affects the sales. Examining last year’s data, we get the following plot

We notice a somewhat linear relation and decide to “fit” a Linear Regression model. [[Maths Alert]]
i.e. Considering our target (sales) as y and input variable (rainfall) as x, we can express the data as :
y = mx +c

OR in other words

Sales = slope * Rainfall + intercept

Different values of m and c may give us different lines, but we want the values that give us the line that’s the closest our data points.

From all different possible lines (some of them given in grey), we notice that one of them (in bold red) is the closest to our data, i.e. if we use this line for estimating sales from rainfall, we would get predictions that are closest to the actual sales numbers. Note that this solution that we have found is also just a certain m(slope) and c(intercept).

Thus, for the next year, if we know the rainfall, we can predict the sales for a district as

Sales = m * Rainfall + c

Using the values of m and c we had found as our solution.

Extending this example, if we wanted to include more variables, say the repo rate (a proxy for loan’s interest rate), then our equation would’ve been

Sales = m1 * (Rainfall) + m2 * (Repo Rate) + c

And the model would have to calculate the optimum value of the parameters — m1, m2, and c to predict sales from the input data.

How these optimum value for these slopes (‘m’s) and intercept(c) which give us a plot closest to our data, and how exactly we define this closeness involves quite a bit of math and is thus outside the scope of this article, but the important thing is these are the “parameters” that our Linear Regression “model” has learnt, and they can be used to predict our target variable from our input data.

Just like the Linear Regression had tried to fit a straight line on our data, different models would try to fit different functions onto our data, ie, try to find plots which are closest to the data’s points, and just like we had the slopes and intercept in Linear Regression, there would be other parameters whose optimum value we’re trying to find which gives us the closest fit with the data.

Classification algorithms work similarly, with the only difference that they’re trying to adjust the parameters in a way that gives us the maximum accuracy or assigning the right class to as many data-points as possible.

All models whose names you may have heard — Linear Regression, Logistic Regression, SVMs, Decision Trees, Random Forests, Neural Networks (or Deep Learning models) are just trying to estimate a function that gives us the predictions closest to the data and find optimum values of parameters for that function, and this process of finding the optimum value of parameters is called “training”

A few things to note here:

The quantity and quality of data matter: If we have very few data points, we cannot be very sure about our predictions due to lack of evidence. But just having a lot of data points doesn’t guarantee good predictions, if we have rainfall much more than we’ve ever historically, we cannot accurately predict the sales, because the model has never seen values in this range of the input. The context also matters a lot- a model that has been trained well for India, may work to an extent for a similar demographic of developing countries, but will give wrong predictions in a place with completely different dynamics, say, the United States.
Domain Expertise: The choice of the model and its implementation is one part of getting good predictions, but the much more important factor is the data that’s being sent to the model, and that’s where domain expertise comes in. For example, in the above example, the person who understands automotive sales well may be able to tell what exactly affects tractor sales and sending those variables as the input data will give much better results than using a very complex model with input data that doesn’t relate well with tractor sales.

Unsupervised Learning

This implies problems wherein there is no target in the data, and our objective is to explore the distribution of data. One of the major use cases of unsupervised learning is clustering

Clustering

As the name suggests, it’s identifying clusters within data. For example, if a bank has the following data about the customers’ age and salary.

The bank wants to group similar users so that marketing and product strategies can be created around these groups

Note: The y-axis (Salary) is not to be confused as the target, as in the case of regression. Our task is not to predict the salaries from the age data, but group people of similar age and salary together.

A clustering algorithm may give us the following results

Here each data-point (or customer) is now associated with a cluster (denoted by its colour), and this can make it easier to build strategies around each product.

Note: Only 2 variables (age and salary) were considered for this example to illustrate the point, and thus it may have been possible to visually explore the data and make clusters even without any algorithm here, but in reality, we deal with multiple(anything more than 3) variables, it is impossible to visualize the data.

Once again there are several algorithms to cluster data, each suitable for different data distributions and use-case, but they all try to achieve the same objective- identify groups of similar points in the data without any target.

Other ML algorithms

There are other unsupervised learning algorithms apart from clustering, for example, dimensionality reduction. But since, they have limited business use-cases, and are mostly used to improve the performance of other models, I will not be discussing them here.

Reinforcement Learning is a completely different category of ML algorithms which are used to decide the action to maximize the long term rewards in a relatively less dynamic environment. Discussing it in detail is out of scope for this article, but they are relatively infrequently used in business applications as of now.

Thus, most of the ML use-cases can be framed as one of Regression, Classification or Clustering tasks

Parting Thoughts

I hope this gave you a better understanding of what Machine Learning is, and what all it can do. A Data Scientist implementing these ML models must understand the nitty-gritty of the different models, how they work, and where all they can be used. But as a business user, knowing what all insights you can derive from the data you have can be a good first step to begin discussions with your analytics department or consultants, to help your organization start making better, data-driven decisions