Analytical Solution of Linear Regression

Yang Liu
Towards Data Science
4 min readOct 31, 2018

--

Introduction

We have known optimization method like gradient descent can be used to minimize the cost function of linear regression. But for linear regression, there exists an analytical solution. That means we can obtain the variables for linear regression in one step calculation by using the right formula. In this post, we will look into the analytical solution of linear regression and its derivations.

Analytical Solution

We first give out the formula of the analytical solution for linear regression. If you are not interested in the derivations, you can just use this formula to calculate your linear regression variables. The solution is:

All symbols are vectorized in this formula. If you are not familiar with linear algebra or vectorization, please refer to this blog. In this formula, X is a m by n matrix, which means we have m samples and n feature. The symbol y is a m by 1 vector representing the target label and θ is a n by 1 vector representing all the coefficients we need for each feature.

Derivations

We know the vectorization expression(more details please refer to blog) of linear regression cost function can be denoted as :

Since 1/(2*m) is a constant, when we minimize a function, multiply or divide the cost function by a non-zero constant doesn’t affect the minimization result, thus in this case, we omit this constant term. For convenience, our cost function becomes:

This can be further simplified as:

We expand it to obtain:

Now need some transformation on the second term. We know X is a m by n matrix and θ is n by 1 matrix, thus Xθ has dimension m by 1 and its transpose has dimension 1 by m. Since y is m by 1, thus the dimension of the second term turns out to be 1. In other words, the second term is a scalar. We know the transpose of a scalar equals to itself, thus we take the transpose of the second term to get:

We substitute is back into our cost function to obtain:

Further more, we can write it as:

Now we need to take derivative of the cost function. For convenience, the common matrix derivative formulas are listed as reference:

Using the above formulas, we can derive our cost function respect to θ as:

In order to solve the variables, we need to make the above derivation equal to zero, that is:

We can simplify it as:

Thus we can compute θ as:

Conclusion

In this blog, we give the analytical solution of solving the variables for linear regression. We went through the steps of how to derive this result from the derivation of cost function in details.

--

--