Comparing Robustness of MAE, MSE and RMSE

How the main regression metrics behave in the presence of outliers

Vinícius Trevisan
Towards Data Science

--

Original by Chris Liverani on Unsplash

If you deal with data, you probably already know that MSE is more sensitive to outliers than MAE. But did you ever test it? I did, and this article is about it.

Many regression models rely on distance metrics to determine the convergence to the best result. Even the definition of a “best” result needs to be explained quantitatively by some metric.

Usually the metrics used are the Mean Average Error (MAE), the Mean Squared Error (MSE) or the Root Mean Squared Error (RMSE).

Image by author

In short, MAE evaluates the absolute distance of the observations (the entries of the dataset) to the predictions on a regression, taking the average over all observations. We use the absolute value of the distances so that negative errors are accounted properly. This is exactly the situation described on the image above.

Another way to do so is by squaring the distance, so that the results are positive. This is done by the MSE, and higher errors (or distances) weigh more in the metric than lower ones, due to the nature of the power function.

A backlash in MSE is the fact that the unit of the metric is also squared, so if the model tries to predict price in US$, the MSE will yield a number with unit (US$)² which does not make sense. RMSE is used then to return the MSE error to the original unit by taking the square root of it, while maintaining the property of penalizing higher errors.

Robustness

Robustness can be defined as the capacity of a system or a model to remain stable and have only small changes (or none at all) when exposed to noise, or exaggerated inputs.

So a robust system or metric must be less affected by outliers. In this scenario it is easy to conclude that MSE may be less robust than MAE, since the squaring of the errors will enforce a higher importance on outliers.

Let’s prove that.

Code preparation

The code for this study can be found on my github, so feel free to jump to the next section.

First we define the code to calculate all three metrics. In this case we will not compare the observations with a regression line, but instead we will measure the distance of each observation to the mean of the set:

Then, we create sets with many random observations inside. The points will be sampled from a normal distribution with mean = 100 and variance = 20.

Now for the main function evaluate_metrics(). The goal here is to evaluate MAE, MSE and RMSE for each set of observations. Surely, due to the random process of creating the sets the metrics will be slightly different. We can plot this distributions on the absence of outliers and call it the "original" distribution.

Then we can sample random points of each set and multiply them by a high number, so that they will surely become outliers. By repeating the process above to those now-noisy observations, we can plot another distribution curve and compare to the original one.

The code of the function is below.

Experiment

By changing the number of outliers (num_outliers) and the amplitude of the scalar by which we multiply the original observation point (amplitude_outliers), it is possible to compare the robustness of the metrics in many different scenarios.

So, as a control group, we can set the function to have zero outliers

evaluate_metrics(data, num_outliers = 0, amplitude_outliers = 1)

and the result is that both the original and noisy distributions are identical, as expected:

Image by author

A few things are worth noticing, though. The mean of the MAE distribution is around 16, and the mean of the MSE distribution is around 400. It is expected that the value of the MSE errors are higher than the MAE errors by something around the power of two, so nothing new under the sun here.

But when taking the square root of the MSE and getting the RMSE we get a mean around 20, which is higher than the MAE. This is due to the fact that MSE and RMSE amplify the higher errors more than the lower ones.

Also, the RMSE and MSE curves are identical, which is also expected, since the square root should not change the distribution, only the scale. (Note: to compare them, focus on the curve and ignore the histogram bars)

Now we can start the comparison with noisy data. In the first test we can add only two outliers with amplitude = 2:

evaluate_metrics(data, num_outliers = 2, amplitude_outliers = 2)

As expected, the outliers will increase the mean error and cause the noisy distributions to shift right:

Image by author

When comparing any noisy distribution with its original counterpart, it is possible to notice that the noisy ones are now deformed. It is expected that they would not have the same form as the original ones, since the outliers can randomly change these distributions which are not completly normal (Gaussian) anymore.

Also, it is clear that the noisy MSE and RMSE distributions shifted more than the MAE ones. This is further proof that they are less robust to outliers.

By adding more outliers we can get distributions further apart:

evaluate_metrics(data, num_outliers = 10, amplitude_outliers = 2)
Image by author

The amplitude also plays an important role, so we can go back to having two outliers but wih higher amplitude

evaluate_metrics(data, num_outliers = 2, amplitude_outliers = 10)

In this case, since the MSE and RMSE are way more affected by high-intensity outliers, the separation is even worse on them:

Image by author

To conclude, now that it is clear the effects outliers can have in squared errors such as MSE or RMSE, it is important to say that in applications which are clear of noise these metrics can do more good than harm, since they can minimize greater errors even though it means accepting more frequent, smaller errors.

Check the github repository for this article:

If you like this post…

Support me with a coffee!

Buy me a coffee!

And read this awesome post

--

--