Linear regression is probably the most simple ‘machine learning’ algorithm. I bet you’ve used it many times, possibly through Scikit-Learn or any other library providing you with an out-of-the-box solution.

But have you ever asked yourself: How does the model actually work behind the scenes?
Sure, in case of simple linear regression (only one feature) you can calculate slope and intercept coefficients with a simple formula, but those formulas cannot be transferred to multiple regression. If you don’t know anything about simple linear regression, check out this article:
Today I will focus only on multiple regression and will show you how to calculate the intercept and as many slope coefficients as you need with some linear algebra. There will be a bit of math, but nothing implemented by hand. You should be familiar with the terms like matrix multiplication, matrix inverse, and matrix transpose.
If those sound like science fiction, fear not, I have you covered once again:
At the bottom of that article is a link to the second part, which covers some basic concepts of matrices. Let’s now quickly dive into the structure of this article:
- Math behind
- Imports
- Class definition
- Declaring helper function
- Declaring fit() function
- Declaring predict() function
- Making predictions
- Conclusion
A lot of stuff to cover, I know. I’ll try to make it as short as possible, and you should hopefully be able to go through the entire article in less than 10 minutes.
Okay, let’s dive in!
Math Behind
As I stated, there will be some math. But it’s not as complex as you might think. You will have your features (X) and the target (y). This is how to express the model:

Where y is the vector of the target variable, X is a matrix of features, beta is a vector of parameters that you want to estimate, and epsilon is the error term. From the dataset, you’ll want to split features (X) from the target (y), and also add a vector of ones to X for the intercept (or bias) term.
Once done, you can obtain coefficients by the following formula:

You can see now that you’ll need to understand what is transpose and what is inverse, and also how to multiply matrices. Good thing is, you won’t do this by hand as Numpy has you covered.
And that’s pretty much it when it comes to math. Dive deeper if you dare, but it won’t be necessary for the completion of this article. You may now proceed to the next section.
Imports
I’ve promised you pure Numpy implementation right? Well, you’ll use Pandas also, but only to read data from the CSV file, everything else will be done with Numpy.
I’ve also imported warnings module so the Notebook remains clean:

Let’s read in the Boston Housing Dataset now:

That’s pretty much it for the imports, let’s do some coding next.
Class Definition
I’ve decided to implement Multiple Regression (Ordinary Least Squares Regression) with OOP (Object Orientated Programming) style.
If OOP just isn’t your thing you can skip this part and jump to the next one, and declare each function in its own cell, but I recommend sticking to the OOP style. To start out, let’s declare a new class, OrdinaryLeastSquares:

It doesn’t do anything just yet. Right now we’ll only declare the init method, and the rest will be covered in the following sections.
I want the user to be able to see the coefficients of a regression model, so here’s how to address that:

And that’s it with regards to the init method, you may now proceed.
Declaring Helper Functions
If you take a moment to think about what your model should do automatically for the user, you’ll probably end up with the list of two things (or more):
- Reshape features (X) in case there’s only one feature
- Concatenate a vector of ones to the feature matrix
In case you don’t do so, your model will fail. No one likes that.
The first helper function is pretty simple, you just need to reshape X to anything two-dimensional:

And for the second helper function, you want a vector of ones with the same number of elements as one column of your feature matrix has. Using numpy you can generate that vector and concatenate it:

If you’re wondering what’s with this underscore before the function name, well that’s how you declare a method to be private in Python. Weird, right?
Nevertheless, that’s pretty much everything for now.
Declaring fit() function
This is the heart of your model. The fit() function will be responsible for training the model and doing reshaping and concatenation operations (calling previously declared helper functions).
If X is one-dimensional, it should be reshaped. That two-dimensional representation should then be concatenated with the vector of ones.
Finally, you can use the formula discussed above to obtain coefficients. Note how I’m setting them to self.coefficients because I want them to be accessible by the end-user:

Just one more function and you are ready to go!
Declaring predict() function
As with the previous one, predict() function will also be necessary to the end-user. It will be used to validate the model and make new predictions.
Let’s drill down into the logic behind it. Essentially, you want user input to be formatted as a list. The first coefficient represents the intercept or the bias term, and all the others will need to be multiplied with the respective value of X. So the idea is to iterate over new X and all coefficients at the same time (that are not the intercept term) and multiply them, and then to increment prediction by the result:

Pretty neat, huh? Well, that all there is to it, you can now use this class to make an instance, and then to make predictions.
Let’s see how to do it.
Making Predictions
Earlier in the article, we loaded the Boston housing dataset. Now it’s time to construct feature matrix and target vector – or X and y in plain English:

You can do a train-test split here as you would normally do, but I decided not to, just to keep the article concise. Make an instance of OrdinaryLeastSquares and fit both X and y to it – just as you would do with Scikit-Learn:l

The training is complete. You can access the coefficients like this:

Sweet, right? Let’s say you want to make a prediction for the first row of X:

Everything works. Or if you want to make predictions for every row in X:

Yep, everything looks good. You could now go and calculate some metrics like MSE, but that’s not the point of this article.
Before You Leave
It might be a good idea to try to implement this Ordinary Least Squares Regression by hand. I mean with pen and paper. Not with this dataset though, define one or two features and 2 or 3 observations, and try to do the calculations by hand.
It’s not hard, but upon completion, you’ll be more confident in why everything works. Thankfully, linear algebra concepts behind are simple and can be learned rather quickly. You could then use Python to verify the results.
I hope everything is as clean as it can possibly be, but don’t hesitate to contact me if you don’t understand something.
Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.