The world’s leading publication for data science, AI, and ML professionals.

Linear Regression with OLS: Unbiased, Consistent, BLUE, Best (Efficient) Estimator

Understand OLS Linear Regression with a bit of math

Image by Author
Image by Author

The OLS estimator is known to be unbiased, Consistent and BLUE (Best Linear Unbiased Estimator). But what do these properties mean? Why are they important for a linear regression model? In this article, we will discuss these properties.


A typical Linear Regression looks like something as follows. The response variable (i.e., Y) is explained as a linear combination of explanatory variables (e.g., the intercept, X1, X2, X3, …) and ε is the error term (i.e., a random variable) that represents the difference between the fitted response value and the actual response value.

Figure 1 (Image by author)
Figure 1 (Image by author)

In order for OLS to work, we need to make sure some assumptions are true.

Assumptions 1- Linearity in Parameters: The parameter in the linear model is linear.

Figure 2 (Image by author)
Figure 2 (Image by author)

Moreover, the OLS estimator is also linear, we can rewrite the OLS closed-form solution as follows (by substituting Y from figure 1 into figure 3). Matrix algebra only works in the presence of linearity. Therefore, the linearity assumption is proven.

Figure 3 (Image by author)
Figure 3 (Image by author)
Figure 4 (Image by author)
Figure 4 (Image by author)

Assumptions 2- Random Sampling: The observed data represent iid Independent and Identically Distributed) random samples that follow the population model (See figure 1). If data is collected cross-sectionally, we need to make sure they are sampled randomly. The bottom line is the observed data should be representative of the population data

Assumptions 3- No Perfect Collinearity: Any explanatory variable can NOT be expressed as a linear combination of other explanatory variable(s). The reason is the inverse matrix (in figure 3) exists only if X has full rank, meaning if there is perfect collinearity, it won’t have a closed-form solution.

Assumptions 4- Zero conditional mean: Expected value of the error term is zero conditional on all values of the explanatory variable (i.e., E[ε|X] = 0)

Assumptions 5- Homoscedasticity and no Autocorrelation: The error term should have constant variance and iid. In other words, the diagonal values in the variance-covariance matrix of the error term should be constant and off-diagonal values should be all 0.

Assumptions 6- Normality of Errors: The error term is normally distributed. This assumption is not required for the validity of OLS method, but this allows us to have a reliable standard error of estimates and make meaningful statistical inferences.


β vs β^ vs E(β^)

You might have seen some variations of β (e.g., β, β^, E(β^)) in statistics textbooks. Let’s discuss their definitions and differences.

β is a conceptual value- the true (and usually unknown) parameter value(s) (i.e., constant values) which explain the relationship between the explanatory variable(s) and the dependent variable in a population data.

In most cases, we won’t be using population data because it is not available or too large to process. Therefore, we would use sample data (with a finite number of observations) to develop our linear regression model.

Under the assumption of Random Sampling, the observed sample data represent an i.i.d. random sample of size n, which follows the population model. Suppose we have multiple sets of sample data (by drawing samples from the population repeatedly) and run the model separately in each dataset.

In a given sample dataset, we would have an OLS estimator, β^, which can be solved with the closed-form solution (figure 3).

It is very likely that we would get a different set of estimators (i.e., β^) in different datasets. Therefore, β^ is a random variable. Based on the ** Central Limit Theorem, the sampling distribution of β^ has a mean, which converges to β** as the sample size increases.

E(β^) is the expected value of this random variable, β^. In layman’s terms, if we run the linear model in multiple sets of samples, keep recording the values of the estimators and take an average. The average value is the expected value, E(β^).


OLS Estimator is Unbiased

Under the finite-sample properties, we say OLS estimator is Unbiased, meaning the expected value of OLS estimator, E(β^) would equal the true population parameter, β.

Unbiasedness does NOT imply that the OLS estimator we get from the observed data (i.e., one set of random samples) would equal the exact population parameter value because the linear model still can’t fully explain the relationship due to the irreducible error term ε.

Instead, the unbiasedness property implies that if we run the linear regression model repeatedly on different sets of random samples from the same population, then the expected value of the estimator would equal the true population parameter as proven below.

Figure 5 (Image by author)
Figure 5 (Image by author)

Although the OLS estimators we get from the observed data don’t equal the exact population parameter value, as long as the observed data is a good representative of the population data and the linear model is correctly specified under the assumptions, then the coefficient estimator we get from the observed data should be very closed to the true population parameter value.

Otherwise, if observed data is NOT a good representative of the population data, the model would suffer from measurement error, or the linear model is NOT correctly specified due to common issues (e.g., omitted variables or endogeneity), then the coefficient estimator we get from the observed data would be biased.


OLS Estimator is Consistent

Under the asymptotic properties, we say OLS estimator is consistent, meaning OLS estimator would converge to the true population parameter as the sample size get larger, and tends to infinity.

From Jeffrey Wooldridge’s textbook, Introductory Econometrics, C.3, we can show that the probability limit of the OLS estimator would equal the true population parameter as the sample size gets larger if assumptions hold.

Figure 6 (Image by author)
Figure 6 (Image by author)
  • When E[ε|X] = 0 holds, it implies Cov(X, u) = 0, then the second term in figure 6 equals to 0. We’ve proved that as the sample size gets larger, OLS estimator would converge to the true population parameter. Therefore OLS estimator is consistent.
  • If Cov(X, u) ≠ 0, then we have an inconsistent estimator. The inconsistent issue won’t go away as the sample size increases. At the same time, the OLS estimator is biased as well.
  • If Cov(X, u) > 0 meaning x is positively correlated with the error term, then asymptotic bias is upward.
  • If Cov(X, u) < 0 meaning x is negatively correlated with the error term, then asymptotic bias is downward.

You might be wondering why are we interested in large sample properties, such as consistency, when in practice we have finite samples.

The answer is if we can show that an estimator is consistent when the sample size gets larger, then we may be more confident and optimistic about the estimator in finite samples. On the other hand, if an estimator is inconsistent, we know that the estimator is biased in finite samples.


OLS Estimator is Efficient

To evaluate an estimator of a linear regression model, we use its efficiency based on its bias and variance.

  • An estimator that is unbiased but does not have the minimum variance is not the best.
  • An estimator that has the minimum variance but is biased is not the best
  • An estimator that is unbiased and has the minimum variance is the best (efficient).
  • The OLS estimator is the best (efficient) estimator because OLS estimators have the least variance among all linear and unbiased estimators.
Figure 7 (Image by author)
Figure 7 (Image by author)

We can prove Gauss-Markov theorem with a bit of matrix operations.

Figure 8 (Image by author)
Figure 8 (Image by author)
Figure 9 (Image by author)
Figure 9 (Image by author)

Now we’ve proved that the variance of OLS estimator is smaller than any other linear unbiased estimator. Therefore, OLS is the Best (efficient) linear estimator.


Final Notes

  • An estimator is unbiased if the expected value of the sampling distribution of the estimators is equal the true population parameter value.
  • An estimator is consistent if, as the sample size increases, tends to infinity, the estimates converge to the true population parameter. In other words- consistency means that, as the sample size increases, the sampling distribution of the estimator becomes more concentrated at the population parameter value and the variance becomes smaller.
  • Under OLS assumptions, OLS estimator is BLUE (least variance among all linear unbiased estimators). Therefore, it is the best (efficient) estimator.

Here are some related posts you can explore if you’re interested in Linear Regression and Causal Inference.

Thank you for reading!

If you enjoy this article, please click the Clap icon. If you would like to see more articles from me and thousands of other writers on Medium. You can:

  • Subscribe to my newsletter to get an email notification whenever I post a new article.
  • Sign up for a membership to unlock full access to everything on Medium.

Related Articles