
Introduction
Regression is one of the most essential subject for prediction analytics and business forecast. It can be implemented both in linear fashion and by using higher order polynomials. There are instances where the model can be generated using multiple Linear Regression but many of the real world cases have non-linear dependencies between dependent and independent variables. That’s why polynomial regression is needed. Regression using splines are also useful to minimize the drawbacks of polynomial regression such as unwanted wiggliness.
Implementation in Python
Using Statsmodels
For the purpose of implementation in python, I will use the data from this link . This is the data related to fish length by their age.

We would like to fit this data using linear and Polynomial Regression. Using statsmodels, ordinary least square (OLS) can be deployed to fit the data. We can have a look the fitted result.

The R-squared value determines how good is the fitting. The best practice is to also plot the residuals to check the heteroskedasticity but here we will limit the scope to only check the R-squared value. The R-squared value of 0.735 indicates a good fitting but not very strong. The adjusted R-squared value tries to punish the effect of adding more dependent variables. The coefficients of the dependent variable as well as the intercept are shown here.

In order to incorporate higher order term aka polynomial Regression, we need to implement the following block.

The R-squared value for the polynomial regression is 0.801 which is better than the linear regression counterpart.

Using Polyfit
The same regression can be implemented using numpy’s polyfit class. The R-squared value in this case is 0.801 too.

Using Linear Regression and Polynomial Features
Using sklearn’s basic features, both linear and polynomial regression can be implemented.

The R-squared value in this case is 0.735 which is a same as the previous approach (using statsmodels). For higher order regression (Non-linear regression), the following block of code can be used.

The R-squared value is also 0.801 in this case.
Conclusion
We have tried three different approaches to implement polynomial regression and in all cases, we ended up with same R-squared values. All these methods are similar for Regression Analysis in python. A simple test for duration of each execution shows that numpy’s polyfit is the fastest implementation. Therefore, when dealing with big data, polyfit may be a better approach compared to other implementations.