The world’s leading publication for data science, AI, and ML professionals.

Simple Linear and Polynomial Regression

Use of Statsmodels, Polyfit, and Linear Regression and Polynomial Features

Image from Unsplash
Image from Unsplash

Introduction

Regression is one of the most essential subject for prediction analytics and business forecast. It can be implemented both in linear fashion and by using higher order polynomials. There are instances where the model can be generated using multiple Linear Regression but many of the real world cases have non-linear dependencies between dependent and independent variables. That’s why polynomial regression is needed. Regression using splines are also useful to minimize the drawbacks of polynomial regression such as unwanted wiggliness.

Implementation in Python

Using Statsmodels

For the purpose of implementation in python, I will use the data from this link . This is the data related to fish length by their age.

Image by Author
Image by Author

We would like to fit this data using linear and Polynomial Regression. Using statsmodels, ordinary least square (OLS) can be deployed to fit the data. We can have a look the fitted result.

Image by Author
Image by Author

The R-squared value determines how good is the fitting. The best practice is to also plot the residuals to check the heteroskedasticity but here we will limit the scope to only check the R-squared value. The R-squared value of 0.735 indicates a good fitting but not very strong. The adjusted R-squared value tries to punish the effect of adding more dependent variables. The coefficients of the dependent variable as well as the intercept are shown here.

Image by Author
Image by Author

In order to incorporate higher order term aka polynomial Regression, we need to implement the following block.

Image by Author
Image by Author

The R-squared value for the polynomial regression is 0.801 which is better than the linear regression counterpart.

Image by Author
Image by Author

Using Polyfit

The same regression can be implemented using numpy’s polyfit class. The R-squared value in this case is 0.801 too.

Image by Author
Image by Author

Using Linear Regression and Polynomial Features

Using sklearn’s basic features, both linear and polynomial regression can be implemented.

Image by Author
Image by Author

The R-squared value in this case is 0.735 which is a same as the previous approach (using statsmodels). For higher order regression (Non-linear regression), the following block of code can be used.

Image by Author
Image by Author

The R-squared value is also 0.801 in this case.

Conclusion

We have tried three different approaches to implement polynomial regression and in all cases, we ended up with same R-squared values. All these methods are similar for Regression Analysis in python. A simple test for duration of each execution shows that numpy’s polyfit is the fastest implementation. Therefore, when dealing with big data, polyfit may be a better approach compared to other implementations.


Related Articles