The world’s leading publication for data science, AI, and ML professionals.

R-squared VS Adjusted R-squared – Simplified

A must-know concept for every machine learning enthusiast

Photo by Brett Jordan on Unsplash
Photo by Brett Jordan on Unsplash

Appreciate the fact that there must be some loophole in the R-squared measure because of which adjusted R-squared measure was introduced. However, it sometimes gets misinterpreted and people apply the same intuition for this as for normal R-squared measure which is incorrect. Before we arrive at the ugly mathematical expression of adjusted R-squared, we need to go through various terminologies and their purpose of existence like SST(Sum of squares – Total), SSR(Sum of squares – Regression), SSE(Sum of squares – Error) & finally DOF(Degrees of Freedom). We will begin by first discovering the issue with the R-squared measure and then move on to explain how using adjusted R-squared, the issue gets fixed.

Let’s start by considering a variable and try to explain the variation associated with it with the help of some other variables.

(Image by author)
(Image by author)

Taking a random sample of 15 males and plotting their weights:

  1. On Y-axis alone(Left)
  2. On X-Y Plane with Height measure on X-axis(Right)
(Image by author)
(Image by author)

Clearly, there is a variation in the weights of 15 males and to capture it we first need to identify the central tendency(mean):

*Calculations are being avoided since the purpose here is to imbibe the intuition

(Image by author)
(Image by author)

We know that the total error capturing measure is minimum around the optimum constant value which is achieved by equating the first-order derivative of RMSE(Root mean square error) to zero.

*Note – There exist many error capturing measures & each one has its own pros/cons, we are using RMSE here since it performs well in the regression domain(except for outliers).

In our example, the optimum constant value is nothing but the mean of the sample. This is also known as the "baseline" prediction model since any explanation of the part of total error here onwards using external variables is computed relative to the baseline model(Constant prediction – Mean).

(Image by author)
(Image by author)

Now the sum of squares of the errors relative to the constant baseline prediction (mean of the sample) is calculated:

(Image by author)
(Image by author)

We see from the plot that there is a positive correlation between the Weight & height measure of the males, this means that a part of the total error(SST) can be explained by this relationship. We shall start by capturing this positive correlation with the help of a straight line cutting through the points in the X-Y plane such that we leave behind a minimum part of the total error to be explained ensuring best-fit. This best-fit capturing is gauged with the help of R-squared measure:

(Image by author)
(Image by author)

SST(Total Error)= SSE(Unexplained Part) + SSR(Explained Part)

R-squared measure = Explained Part of Total Error/Total Error

R-squared measure = SSR/SST = (SST-SSE)/SST

R-squared measure = 1- SSE/SST

Now, theoretically higher the R-squared measure better the fit but now comes the loophole that makes this measure deceivable. Time to discuss the "Degrees of Freedom" concept:

For sample statistics(Inferential domain), we know that given a sample set of n Data points, we have constraint-free ability to manipulate (n-1) data points. As we fix these (n-1) data points, the nth data point gets automatically fixed as the sample mean is already fixed. In our example above, we have 15 data points in the sample, so as per our discussion the constraint-free ability to manipulate the values will be (15–1) = 14. These are the total Degrees of Freedom available with our sample set.

The problem arises when we consider additional external variables to further explain the residual unexplained error in our target variable(Weight). Suppose along with Height measure if we also incorporate bone density as an explanatory variable, what impact will it have on the R-squared value? Yes, you are right it will increase but what if we add some variable that makes no intuitive sense in explaining the variation of the Weight measure?

The R-squared measure still increases, which is wrong, and let’s see why this is happening:

(Image by author)
(Image by author)

Notice how with one explanatory variable & with two data points in the sample, the R-squared measure has no choice but to be 1. It is with an additional data point that the data set now has one Degree of Freedom. Let’s increase the explanatory variables count to two & visualize something new in the 3-D space:

(Image by author)
(Image by author)

With an additional explanatory variable and three data points, we again encounter the same problem where the R-squared measure has no choice but to be 1(DOF=0). As we add one more data point to the sample, the data set regains one Degree of Freedom.

This issue needs to be addressed with modification in the formula of R-squared measure:

We know that with fixed data points in the sample if we increment the explanatory variable by one, the Degree of Freedom gets reduced by one. Also, the total Degrees of Freedom available with a sample of n data points is (n-1).

So, the DOF split takes place in the following manner:

Total DOf = n-1

Explanatory variables = k

Leftover DOF = n-k-1

This probes us to define the adjusted R-squared measure to resolve the issues discussed earlier.

Re-iterating again, R-squared keeps on increasing with additional explanatory variables without accounting for their power of explaining the variation present in the target variable. Let’s check out the formula of adjusted R-squared now:

Adjusted R-squared = 1-SSE(adjusted)/SST(adjusted)

-where SSE(adjusted) = SSE/(n-k-1),SST(adjusted) = SST/(n-1)

Adjusted R-squared = 1-(SSE(n-1)/SST(n-k-1))

SSE/SST can be written as (1-R2)

Simplifying, we get:

Let’s find out how adjusted R-squared measure manages to counter the issue faced in R-squared measure(assuming fixed data points-n):

  1. The denominator (n-k-1) accounts for an increase in explanatory variables, it decreases with an increase in the number of explanatory variables.
  2. The numerator (1-R2) accounts for the power of additional explanatory variables, if the variables are powerful- R2 increases and the numerator decreases.
  3. It is the relative interaction effect that gets reflected in the adjusted R-squared measure.

Point to remember:

Unlike R-squared, the Adjusted R-squared is not bounded between 0 & 1 and should not be interpreted similar to R-squared measure(it doesn’t reflect what percentage of error is being explained)

Adjusted R-squared gives a trigger of when the predictive model is losing the power of explaining the target variable variance and any further additions of variables are not contributing as expected.

Attaching one sample report below for comparison & intuition:

(Image by author)
(Image by author)

Notice how after the addition of the 5th explanatory variable, Adjusted R-squared takes a dip whereas R-squared keeps on increasing. So, you are expected to stop at four explanatory variables and not fall prey to losing DOF with additional vague variables. Alright, time to end this blog here.

Conclusion:

I hope this exhaustive visual explanation has made this topic crystal clear and build an intuition about its conceptual working. The major takeaway is to never interpret adjusted R-squared like R-squared and treat it as a trigger to know where to stop adding the explanatory variables. I have & will be covering similar topics and try my best to simplify each one of them to the best of my capacity. Keep a watch for the upcoming blogs and visit my profile to check out my previous work.

Thanks!!!


Related Articles