Linear Regression from Scratch with NumPy — Implementation (Finally!)

Levent Baş
Towards Data Science
5 min readAug 1, 2019

--

Welcome to the second part of Linear Regression from Scratch with NumPy series! After explaining the intuition behind linear regression, now it is time to dive into the code for implementation of linear regression. If you want to catch up on linear regression intuition you can read the previous part of this series from here. Now, let’s get our hands dirty!

First things first, we start by importing necessary libraries to help us along the way. As I have mentioned before, we won’t be using any packages that will give us already implemented algorithm models such as sklearn.linear_model since it won't help us grasp what is the underlying principles of implementing an algorithm because it is an out-of-the-box (hence, ready-made) solution. We want to do it the hard way, not the easy way.

Moreover, do notice that we can use sklearn package (or other packages) to make use of its useful functions, such as loading a dataset, as long as we don't use its already implemented algorithm models.

We will be using:

  • numpy (obviously) to do all of the vectorized numerical computations on the dataset including the implementation of the algorithm,
  • matplotlib to plot graphs for better understanding the problem at hand with some visual aid,
  • sklearn.datasets to load a toy dataset to play around with our written code.
Total samples in our dataset is: 506

Now, it’s time to load the dataset we will be using throughout this post. The sklearn.datasets package offers some toy datasets to illustrate the behaviour of some algorithms and we will be using load_boston()function to return a regression dataset. Here, dataset.data represents the feature samples and dataset.target returns the target values, also called labels.

It is important to note that, when we are loading the target values, we are adding a new dimension to the data (dataset.target[:,np.newaxis]), so that we can use the data as a column vector. Remember, linear algebra makes a distinction between row vectors and column vectors.

However, in NumPy there are only n-dimensional arrays and no concept for row and column vectors, per se. We can use arrays of shape (n, 1) to imitate column vectors and (1, n) for row vectors. Ergo, we can use our target values of shape (n, ) as a column vector of shape (n, 1) by adding an axis explicitly. Luckily, we can do that with NumPy's own newaxis function which is used to increase the dimension of an array by one more dimension, when used once.

We have chosen the (1/2) x Mean Squared Error (MSE) as our cost function, so we’ll implement it here. h denotes our hypothesis function which is just a candidate function for our mapping from inputs (X) to outputs (y).

When we take the inner product of our features with the parameters (X @ params), we are explicitly stating that we will be using linear regression for our hypothesis from a broad list of other machine learning algorithms, that is, we have decided that the relation between feature and target values is best described by the linear regression.

We can now implement gradient descent algorithm. Here, n_iters denotes the number of iterations for the gradient descent. We want to keep the history of our costs returned by the cost function in each iteration so we use a NumPy array J_history for that.

As for the update rule, 1/n_samples) * X.T @ (X @ params - y) corresponds to the partial derivative of the cost function with respect to the parameters. So, params holds the updated parameter values according to the update rule.

Before we run the gradient descent algorithm on our dataset, we normalize the data. Normalization is a technique often applied as part of data preparation in machine learning pipeline which typically means rescaling the values into a range of [0,1] to boost our accuracy while lowering the cost (error). Also, note that we initialize the paramaters (params) to zeros.

Initial cost is:  296.0734584980237 Optimal parameters are: 
[[22.53279993]
[-0.83980839]
[ 0.92612237]
[-0.17541988]
[ 0.72676226]
[-1.82369448]
[ 2.78447498]
[-0.05650494]
[-2.96695543]
[ 1.80785186]
[-1.1802415 ]
[-1.99990382]
[ 0.85595908]
[-3.69524414]]
Final cost is: [11.00713381]

There you have it! We have run the algorithm successfully as we can clearly see that the cost decreased drastically from 296 to 11. The gradient_descent function returned the optimal parameter values, consequently, we can now use them to predict new target values.

Class Implementation for Linear Regression

Finally, after implementing linear regression from scratch we can rearrange the code we have written so far, add more code to it, make some modifications and turn it into a class implementation so that we have our very own linear regression module! There you go:

Do notice the similarities between our implementation and sklearn's own implementation of linear regression.

I have done it on purpose of course, to show you that we can write a simplified version of a widely used module that works in a similar way to sklearn's implementation.

I (mostly) did it for the fun, though!

Comparing Our Implementation with Sklearn’s Linear Regression

We have done a pretty good job with that implementation, haven’t we? Our training accuracy is almost the same as the sklearn's accuracy. Also, test accuracy is not so bad comparing to the test accuracy of sklearn.

Undoubtedly, this has been fun. I encourage you to code this all by yourself after you finish reading the article.

You can also check out my GitHub profile to read the code along a jupyter notebook or simply use the code for implementation.

I’ll be back with more implementations and blog posts in the future.

Happy coding!

Questions? Comments? Contact me at leventbas92@gmail.com and through GitHub.

--

--

Machine Learning Enthusiast | Computer Engineering | Boğaziçi University