Mathematics for Machine Learning : Linear Regression & Least Square Regression
Machine learning is all about Mathematics, though many libraries are available today which can apply the complex formulas with a function call, it’s any way desirable to learn at least the basics about it to understand it in better.
Let’s try to understand the Linear Regression and Least Square Regression in simple way.
What is Linear Regression?
Linear Regression is a predictive algorithm which provides a Linear relationship between Prediction (Call it ‘Y’) and Input (Call is ‘X’).
As we know from the basic maths that if we plot an ‘X’,’Y’ graph, a linear relationship will always come up with a straight line. For example, if we plot the graph of these values
(Input) X = 1,2,3,4,5
(Prediction) Y = 1,2,3,4,5
It will be a perfectly straight line
Before moving further into this, let’s understand the fact that in real life, we don’t get such a perfect relationship between Inputs and Predictions and that’s why we need machine learning algorithms
Equation of Straight Line from 2 Points
The equation of a straight line is written using the y = mx + b
, where m
is the slope (Gradient) and b
is y-intercept (where the line crosses the Y axis).
Once we get the equation of a straight line from 2 points in space in y = mx + b
format, we can use the same equation to predict the points at different values of x
which result in a straight line.
In this formula, m
is the slope and b
is y-intercept.
Linear regression is a way to predict the
'Y'
values for unknown values of Input'X'
like1.5, 0.4, 3.6, 5.7
and even for-1, -5, 10
etc.
Let’s take a real world example to demonstrate the usage of linear regression and usage of Least Square Method to reduce the errors
Linear Regression with Real World Example
Let’s take a real world example of the price of agricultural products and how it varies based on the location its sold. The price will be low when bought directly from farmers and high when brought from the downtown area.
Given this dataset, we can predict the price of the product in intermediate locations
When a dataset is used for predictions, it’s also called as Training Data Set
In this example, if we consider Input 'X — Axis'
as Sale Location and 'Y — Axis'
as Price (think of any currency you’re familiar with), we can plot the graph as
Problem Statement
Given this dataset, predict the price of agricultural product, if it’s sold in intermediate locations between farmers house and city downtown
Training DataSet
The dataset provided above can be considered as Training DataSet for the problem statement stated above, If we consider these inputs as Training Data for the model, we can use that model to predict the price at locations between
- Farmers home — Village
- Village — Town
- Town — City
- City — City Downtown
Our aim is to come with a straight line which minimizes the error between training data and our prediction model when we draw the line using the equation of straight line.
Equation of Straight Line (y = mx + b)
The maths allow us to get a straight line between any two (x,y)
points in two dimensional graph. For this example, let’s consider farmers home and price as starting point and city downtown as ending point.
The coordinates of the start and end points will be
(x1,y1) = (1, 4)
(x2,y2) = (5, 80)
where
x
represents the location andy
represent the price.
The first step is to come up with a formula in the form of y = mx + b
where x
is a known value and y
is the predicted value.
To calculate the Prediction y
for any Input value x
we have two unknowns, the m = slope(Gradient)
and b = y-intercept(also called bias)
Slope (m = Change in y/ Change in x)
The slope of the line is calculated as the change in y
divided by change in x,
so the calculation will look like
The y-intercept / bias shall be calculated using the formula y-y1 = m(x-x1)
Once we arrived at our formula, we can verify the same by substituting x
for both starting and ending points which were used to calculate the formula as it should provide the same y
value.
Now we know that our formula is correct as we get the same y
value by substituting the x
value, but what about other values of x
in between i.e 2,3,4
, let’s find out
These values are different from what was actually there in the training set (understandably as original graph was not a straight line), and if we plot this(x,y)
graph against the original graph, the straight line will be way off the original points in the graph of x=2,3, and 4
.
Nevertheless, the first step is successful as we managed to predict the
Y
for unknown values ofX
Minimizing the Error
The error is defined as the difference of values between actual points and the points on the straight line). Ideally., we’d like to have a straight line where the error is minimized across all points.
The are many mathematical ways to do the same and one of the methods is called Least Square Regression
Least Square Regression
Least Square Regression is a method which minimizes the error in such a way that the sum of all square error is minimized. Here are the steps you use to calculate the Least square regression.
First, the formula for calculating m = slope
is
Note: **2 means square, a python syntax
So let’s calculate all the values required to come up with the slope(m), first start with calculating values with x
Now let’s calculate the values with y
The availability of these values allows us to calculate Sum of all
(x — xmean)*(y — ymean)
Now let’s calculate the denominator part of the equation which is
Sum of (x — xmean)**2
So the overall calculation would be
Calculation of y-Intercept
The y-intercept is calculated using the formula b = ymean — m * xmean
The overall formula can now be written in the form of y = mx + b
as
Using Least Square Regression on X,Y values
Let’s see how the prediction y
changes when we apply y = 19.2x + (-22.4)
on all x values.
Let’s plot this particular straight line graph against the standard values.
As we can see that these values are nearer to the actual line as compared to direct straight line values between starting and end points. If we compare this with the straight line graph we visualize the difference
Why this method is called Least Square Regression ?
This method is intended to reduce the sum square of all error values. The lower the error, lesser the overall deviation from the original point. We can compare the same with the errors generated out of the straight line as well as with the Least Square Regression
As we can see that Least Square Method provide better results than a plain straight line between two points calculation.
The least square is not the only methods used in Machine Learning to improve the model, there are other about which I’ll talk about in later posts
Thanks for reading…!!!
Daksh