The world’s leading publication for data science, AI, and ML professionals.

Linear Regression with Gradient Descent from Scratch in Numpy

Implement Gradient Descent with Linear Regression with Pen & Paper, and then in Python.

Trying to grow a product without being able to retain users
Image by Elias Sch. from Pixabay

A couple of days back I made an introduction article to gradient descent with some basic math and logic, and at the end of the post, I’ve challenged you to try and implement it with a simple linear regression.

Gradient Descent Demystified in 5 Minutes

I admit – that’s a lot to ask, especially if that article was your first exposure to gradient descent. That’s why today I want to implement it by myself from scratch, with the help of some math first and Python second. After reading this article you’ll understand gradient descent fully and will be able to solve any linear regression problem with it.


Gradient Descent by Hand

I strongly advise you to read the article linked above. It will set the foundations on the topic, plus some math is already discussed there.

To start out, I’ll define my dataset – only three points that are in a linear relationship. I’ve chosen so few points only because the math will be shorter – needless to say, the math won’t be more complex for longer dataset, it would just be longer, and I don’t want to make some stupid arithmetic mistake.

Then I’ll set coefficients beta 0 and beta 1 to some constant and define the cost function as Sum of Squared Residuals (SSR/SSE). Finally, I’ll set the learning rate to something small, let’s say 0.001:

By using the line equation it’s easy to calculate predictions of the model:

As I stated earlier, the cost function will be Sum of squared residuals:

If you know some calculus, you can calculate partial derivatives of the cost function with respect to beta 0 and beta 1. If not, just take those equations and try to search the web for terms ‘multivariate differentiation’ and ‘chain rule’:

The next thing to do is to calculate the cost with respect to beta 0 and beta 1. This boils down to basic arithmetics and is easily done by hand.

Cost for beta 0:

Cost for beta 1:

Once you’ve obtained those two numbers, you can calculate the step size for both coefficients by multiplying calculated cost with the learning rate:

And the final step would be to calculate new beta 0 and beta 1 by subtracting respective step size from the old value:

Now you would use those new coefficient values and repeat the entire process for like 10000 times. For obvious reason you shouldn’t do that by hand – however, it’s nice to know how the algorithm works.

And that’s pretty much it to it, now you would simply repeat the same logic and keep updating your coefficients. Let’s see how would you approach gradient descent in Python.


Gradient Descent in Python

To start out with the implementation, let’s first define the cost function and use Sympy to take the derivatives:

Don’t know how to use Sympy? Check out this article:

Taking Derivatives in Python

You can see the calculated derivatives are the same as ones calculated previously by hand – Sympy only did the multiplication to get rid of the brackets (ish).

Now you can define both x and y as variables and plot them:

You have everything needed – which means gradient descent is up next. Since I’ve already explained in the math section what goes on where I won’t go in much details here, I’ll only outline the process:

  1. Initialize beta 0 and beta 1 coefficients to some value
  2. Initialize learning rate and the desired number of epochs
  3. Make a for loop which will run n times, where n is the number of epochs
  4. Initialize the variables which will hold the errors for the current epoch
  5. Make prediction using the line equation
  6. Append the squared difference to the error array
  7. Calculate partial derivatives for the current row in the dataset for both coefficients
  8. Increase the cost of coefficients
  9. Recalculate the values of coefficients

It sounds like a lot of steps, and it is, but if you’ve read my previous article and followed along with the math you can see that nothing complex happens here. Down below is code for Python implementation:

Once that code cell executes, you can check the final values of your coefficients and use them to make predictions:

Now with the usage of _y_preds_ you are able to add a regression line to the previously drawn plot:

Remember how you’ve kept track of the epoch error? What that will enable you is to draw the error acted over time – ergo if the algorithm was able to find values for parameters that would minimize the error:

Looks like 10000 epochs were a bit too much, but nevertheless, the error was minimized after around 1000 epochs and stayed stationary for the rest of the run.

I’ve added a bunch of print statement to the code cell of gradient descent, so each iteration is printed out (I only run it for 1 epoch to verify the math works), and here’s what I’ve got:

That’s only one iteration, but it is definitive proof that math works as stated earlier. You can now use the same logic on bigger datasets, just for the fun of it.


Final Words

Gradient descent might seem like an intimidating algorithm at first, but the logic behind it is fairly straightforward, and math isn’t as complex as you’ve probably thought before.

I hope you see the clear picture now, in theory, math, and code. How I implemented gradient descent is just one idea and there’s definitely space for improvement – you could save coefficient values in a list – handy if you have more than one feature in your dataset. You now possess all the needed tools to explore further on your own.

Feel free to share your thoughts, and don’t hesitate to contact me if anything wasn’t 100% clear.


Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Dario Radečić


Related Articles