The world’s leading publication for data science, AI, and ML professionals.

Weighted Linear Regression

Overview, advantages, and disadvantages of weighted linear regression

Photo by Jason on Unsplash
Photo by Jason on Unsplash

Linear regression is one of the simplest and well-known supervised machine learning models. In linear regression, the response variable (dependent variable) is modeled as a linear function of features (independent variables). Linear regression relies on several important assumptions which cannot be satisfied in some applications. In this article, we look into one of the main pitfalls of linear regression: Heteroscedasticity.

Linear Regression Model

We start with the Linear Regression mathematical model. Assume there are m observations and n features. The linear regression model is expressed as

where y is the response variable, x is the (n+1) × 1 feature vector, w is (n+1) × 1 vector containing the regression coefficients and e represents the observation error. Note that the first element of the vector x is 1 to represent the interception (or bias):

The linear regression model can be also written in matrix form as

where X is the feature matrix with sizes of m × (n+1), y is a response vector of m × 1 and e is m × 1 vector representing observation errors. It can be shown that the coefficient of linear regression is estimated as

Note that the first element of w represents the estimate of interception.

Assumptions

Linear regression is based on several of important assumptions:

  1. Linearity: means that dependent variable has a linear relationship with independent variables.
  2. Normality: means that the observation errors are normally distributed.
  3. Independency: means that the observation errors are independent of each other.
  4. Homoscedasticity: means that the observation errors are not a function of the response variable and their variance is constant for all observations.
  5. Low multi-collinearity: means that the **** independent variables are not highly correlated to each other.

In many cases with real data, it would be difficult to satisfy all these assumptions. This does not necessarily mean you cannot use linear regression. However, if any of these assumptions is not met, the optimal performance cannot be expected and the inference of the model coefficients could be inaccurate. In this article, our focus is on the assumption 4.

Homoscedasticity

Linear regression assumes that observation errors inside e are independent and identically distributed (i.i.d) normal random variables (assumptions 2, 3, and 4). This condition can be shown mathematically as

where C is the covariance matrix of observation error, I is an identity matrix, and E represents the expected value. In other word, the covariance matrix of e is in form of

Diagonal elements of the covariance matrix represent the variance of each observation error and they are all the same because the errors are identically distributed. The off-diagonal elements represent the covariance between two observation error and they are all zero because the errors are statistically independent. This condition is referred to as homoscedasticity.

Heteroscedasticity

In some applications, homoscedasticity is not guaranteed and observation errors are in fact not identically distributed (although we still assume they are independent). In this case, the covariance matrix of observation errors is represented as

where the diagonal elements are not identical and each observation has its own variance. Lack of homoscedasticity has several consequences on linear regression results. First, the performance of the models is no longer optimal. In other words, the model will not have the lowest mean square error (MSE). Second, model coefficients and standard errors will be inaccurate and hence their inferences and any hypothesis testing based on them will be invalid.

Detection

There are many ways to detect if you are dealing with heteroscedastic or homoscedastic data. The easiest way to do this is to plot the residuals of the linear model versus the predicted values (fitted values) and look for any specific patterns in residuals.

Residuals vs predicted values, left: homoscedastic, right: heteroscedastic data (image by author - source)
Residuals vs predicted values, left: homoscedastic, right: heteroscedastic data (image by author – source)

The residual plot of a homoscedastic data shows no specific pattern and the values are uniformly distributed around the horizontal axis. On other hand, the residual plot of a heteroscedastic data shows the variance (vertical spread along the horizontal axis) of the residuals changes for different predicated values.

#

Weighted linear regression is a generalization of linear regression where the covariance matrix of errors is incorporated in the model. Hence, it can be beneficial when we are dealing with a heteroscedastic data. Here, we use the maximum likelihood estimation (MLE) method to derive the weighted linear regression solution. MLE is a method of estimating unknown parameters by maximizing a likelihood function of the model. The response variable y in the linear regression model is a multivariate normal random variable. Therefore, the MLE can be derived as

Since log function is non-decreasing, we can take a log of likelihood function. We also remove any term which is not dependent on w

which is equivalent to

By expanding the terms inside the parenthesis and removing constant terms

We estimate w by taking the derivative of the above term with respect to w and set it to zero

Solving it with respect to w gives us the solution of weighted linear regression

You can see the solution of weighted linear regression is very similar to that of linear regression. The only difference is that weighted linear regression uses the covariance of errors C to find regression coefficients. Since C is a diagonal matrix, its inverse is simply obtained by replacing diagonal elements with their reciprocal

This expression shows that weighted linear regression uses different weights for each observation based on their variance. If an observation has large error variance, it will have less impact (due to low weight) on the final solution and vice versa. Note that if all observations have the same variance, the above expression will be the same as the solution of linear regression.

Outlier Robustness

Another advantage of weighted linear regression is its robustness against Outliers. Weighted linear regression can assign less weight to outliers and hence reduce their impact on the estimate of the coefficients. Outliers can detected by plotting standardized residual (also referred to as studentized residuals) versus predicted values:

Standardized residuals vs predicted values to detect outliers (image by author - source)
Standardized residuals vs predicted values to detect outliers (image by author – source)

Any observation with absolute standardized residual larger than 3 is considered to be an outlier.

Unknown Covariance

The main disadvantage of the weighted linear Regression is that the covariance matrix of observation errors is required to find the solution. In many applications, such information is not available in prior. In this case, the covariance matrix can be estimated. There are several ways to estimate the covariance matrix. One approach is provided here:

  • Solve linear regression without covariance matrix (or solve weighted linear regression by setting C = I which is the same as linear regression)
  • Calculate the residuals
  • Estimate the covariance from residuals
  • Solve weighted linear regression using the estimated covariance

Python Example

In this section, we provide a Python code snippet to run weighted linear regression for a heteroscedastic data and compare it with linear regression:

In this code, we generate a set of synthetic data where the variance of the observation error is a function of the feature. The actual slope and interception of linear regression model are 5 and 2, respectively. We first use linear regression to find the residuals and estimate the covariance matrix. Then, we run weighted linear regression and find the coefficients

Response variable vs feature variable (image by author)
Response variable vs feature variable (image by author)

Above chart shows that in the presence of heteroscedasticity, the weighted linear regression provides more accurate estimate for the regression coefficients.

Conclusion

In this article, we provide a brief overview of weighted linear regression. Weighted linear regression should be used when the observation errors do not have a constant variance and violate homoscedasticity requirement of linear regression. The major downside of weighted linear regression is its dependency on the covariance matrix of the observation error.

References

Weighted Least Squares & Robust Regression (2021), Department of Statistics, PennState University.

S. Chatterjee, A. S. Hadi, Regression Analysis by Example, 5th Edition (2013), John Wiley & Sons.

S. Kay, Fundamentals of Statistical Processing, Volume I: Estimation Theory (1993), Prentice Hall PTR.


Related Articles