
In this post, I will explain Linear Regression in simple terms. It could be considered a Linear Regression for dummies post, however, I’ve never really liked that expression.
Before we start, here you have some additional resources to skyrocket your Machine Learning career:
Awesome Machine Learning Resources:
- For learning resources go to How to Learn Machine Learning!
- For professional resources (jobs, events, skill tests) go to AIgents.co - A career community for Data Scientists & Machine Learning Engineers.
Linear Regression in Machine Learning
In the Machine Learning world, Linear Regression is a kind of parametric regression model that makes a prediction by taking the weighted average of the input features of an observation or data point and adding a constant called the bias term.
This means that simple linear regression models are models that have a certain fixed number of parameters that depend on the number of input features, and they output a numeric prediction, like for example the price of a house.
The general formula for linear regression is the following:

- ŷ is the value we are predicting.
- n is the number of features of our data points.
- xi is the value of the ith feature.
- Θi are the parameters of the model, where Θ0 is the bias term. All the other parameters are the weights for the features of our data.
If we wanted to use linear regression to predict the price of a house, using 2 features; the surface of the house in squared meters and the number of bedrooms, the custom formula would look something like this:

Okay, the seems pretty intuitive. Now, how do we calculate the values of Θi that best fit our data? Very easy: Using our data to train the linear regression model.
Just to make sure that we are all on the same point, our training data is labelled data: this is data that contains the objective value that we want to calculate for new data points that don’t have this value. In our house price example our training data would consist of a large amount of houses with their price, surface in squared meters, and number of bedrooms.
After we have trained the model, we could use it to predict the price of houses using their squared meters and number of bedrooms.
The steps for training the model are the following:
- 1st we have to choose a metric that tells us how well our model is performing by comparing the predictions made by the model for houses in the training set with their actual prices. These metrics are metrics like the Mean Squared Error (MSE) or Root Mean Squared Error (RMSE).
- We initialise the parameters of the model (Θi) **** to a certain value (usually randomly) and calculate this error for the whole training data.
- We iteratively modify these parameters in order to minimise this error. This is done with algorithms such as Gradient descent, which I will briefly explain now.
Training with Gradient Descent
Gradient Descent is an optimisation algorithm that can be used in a wide variety of problems. The general idea of this method is to iteratively tweak the parameters of a model in order to reach the set of parameter values that minimises the error that such model makes in its predictions.
After the parameters of the model have been initialised randomly, each iteration of gradient descent goes as follows: with the given values of such parameters, we use the model to make a prediction for every instance of the training data, and compare that prediction to the actual target value.
Once we have computed this aggregated error (known as cost function), we measure the local gradient of this error with respect to the model parameters, and update these parameters by pushing them in the direction of descending gradient, thus making the cost function decrease.
The following figure shows graphically how this is done: we start at the orange point, which is the initial random value of the model parameters. After one iteration of gradient descent, we move to the blue point which is directly right and down from the initial orange point: we have gone in the direction of descending gradient.

Iteration after iteration, we travel along the orange error curve, until we reach the optimal value, located at the bottom of the curve and represented in the figure by the green point.
Imagine we had a linear model with only one feature (x1) just so that we can plot it easily. In the following figure the blue points represent our data instances, for which we have the value of the target (for example the price of a house) and the value of the one feature (like for example the squared meters of the house).

In practice, what happens when we train a model using gradient descent is that we start by fitting a line to our data (the Initial random fit line) that is not a very good representation of it. After each iteration of gradient descent, as the parameters get updated, this line changes its slope and where it cuts the y axis. This process is repeated until we reach a set of parameter values that are good enough (these are not always the optimal values) or until we complete a certain number of iterations.
These parameters are represented by the green Optimal fit line.
Its easy to visualise this for a model with only one feature, as the equation of the linear model is the same as the equation of a line that we learn in high school. For a higher number of features the same mechanics apply, however it is not so easy to visualise.

After we have completed the process and managed to train our model using this procedure, we can use it to make new predictions! Like shown in the following figure, using our optimal fit line, and knowing the squared meters of a house, we could use this line to make a prediction of how much it would cost.

Of course, this would be a very simple model, and probably not very accurate, as there are a lot of factors that influence the price of a house. However, if we increased the number of relevant features, linear regression could give us pretty good results for simple problems.
Conclusion and Other resources
Linear regression is one of the most simple Machine Learning models. They are easy to understand, interpretable, and can give pretty good results. The goal of this post was to provide an easy way to understand linear regression in a non-mathematical manner for people who are not Machine Learning practitioners, so if you want to go deeper, or are looking for a more profound of mathematical explanation, take a look at the following video, it explains very well everything we have mentioned in this post.
That is all, I hope you liked the post. Feel free to follow me on Twitter at @jaimezorno. Also, you can take a look at my posts on Data Science and Machine Learning here. Have a good read!
For more posts like this one follow me on Medium, and stay tuned!
The information explained here was taken from the book in the following article, as long with some other resources.