Learn the most common metrics you can use to evaluate your regression models – in this article we will explore 4 metrics including their implications and assumptions

Regression problems are one of the most common problems to solve with Data Science and Machine Learning. When you want to predict a target with a (theoretical) number of infinite values, you are dealing with a regression problem – some examples are:
- Predicting the income of some person based on their education level, years of experience, etc.;
- Predicting the value of a house based on its characteristics;
- Predicting the return of a stock portfolio based on its composition;
After you develop a regression model, there are a lot of metrics you can choose to evaluate your model – each with its own set of characteristics.
If each metric is different, we can say that they are different representation of the model errors and they can be (mainly) used for two purposes:
- To be used as a final definition for a project success – which metric is the one that communicated to the stakeholders.
- As a Cost Function (the function you want to minimize) in the optimization algorithm you might be using.
Metrics are crucial to understand how your model is performing. Let’s dive deeper into some of them!
Mean Squared Error (MSE)
One of the most common metrics to use for regression algorithms is the Mean Squared Error (MSE) – MSE is widely used as a cost function by several algorithms such as Linear Regressions or Neural Networks.
Basically, mean Squared Error consists of the following Formula:

Imagining you have a real value of 200 and you predict, for that example, a value of 150. The contribution of this single example to the overall error of the algorithm will be:

Our example would contribute around 10.000 units to our error! As soon as you have calculated all the errors for each example of your sample, you just average them out and you obtain the Mean Squared Error(MSE).
Some details about MSE:
- Due to the application of the square in the formula, the MSE penalizes large errors.
- As a downside, it’s not interpretable in the scale of the target – particularly if you want to communicate your error.
The MSE is used as a way to evaluate algorithms and is commonly used, by default, as the cost function in a lot of implementations to solve regression problems.
A concept similar to MSE is the RMSE (Root Mean Squared Error) – the only difference is that you apply a square root on the MSE. This enables you to have an error more inline with the magnitude of the target.
Alternatively, there is another metric that shows us the error in the magnitude of the target – let’s know it!
You can learn more about MSE using the following resources:
- R For Data Science Udemy course – Metrics Section
- MSE Wikipedia Page
- Sklearn MSE Implementation
Mean Absolute Error (MAE)
The mean absolute error is an error metric that shows us the error of our predictions using the same scale as the target:

The main difference between the MSE and the Mean Absolute Error (MAE) is the transformation we apply to the difference between the predicted and the real values of the target. While in MSE we apply the squared function, in MAE we apply the absolute value.
The practical implication is that we now evaluate our error on the scale of our target. For example, if we have a target that is measured in dollars, we can say that "on average, we are missing the target by x(MAE value) amount of dollars.". We cannot do this with MSE.
For the example we’ve used in MSE, if we predict a target to be 100 and the real value is 200, the contribution of that example to the overall error is 100, as shown in the formula below:

We then do a simple average of the values for all the examples and we obtain the full MAE.
Some important details about MAE:
- Does not penalize large errors.
- You can use it as a cost function when you don’t want outliers to play a big role in your optimization. Be wary that using MAE can sometimes lead to convergence problems (particularly in complex models).
- It produces an error term that is in the magnitude of the target – something that is good for interpretability.
MAE is extremely valuable to communicate the expected error to stakeholders. While MSE is normally used as a cost function, MAE has an advantage in terms of explicability and relationship with the "real world". This characteristic makes it easier to assess if your algorithm is producing an acceptable error for the business question you are trying to solve.
You can learn more about MAE using the following resources:
- R For Data Science Udemy course – Metrics Section
- MAE Wikipedia Page
- Sklearn MAE Implementation
Mean Absolute Percentage Error (MAPE)
While MAE gives you value that you can discuss if it’s acceptable or not with the stakeholders—the metric never gives you a hint of how much "error" is acceptable.
For example, if you have a Mean Absolute Error of 10 dollars, is that too much? Or is that acceptable? Really depends on the scale of your target!
Although the acceptance of the error is a metric related to the scoping of the project, it is good to have an idea of how the error is deviating from the target in percentual terms.
The Mean Absolute Percentage Error (MAPE) gives you the error term in terms of percentage – this gives you a nice view into the error term in terms of "average" deviation.
MAPE is commonly in time series problems due to its nature. In reality, this is an excellent metric to communicate for these problems as you can clearly state that "on average" your forecast will deviate by x%.
You can also use MAPE as a way to evaluate other continuous variable models, although that depends of the expected values of your target – MAPE does not deal well with values near zero.
The formula for MAPE is pretty simple:

If we have a predicted value of 100 and the real value is 150, the absolute percentage error is:

This value gives us a good metric to understand the relationship between our error and the scale of the target.
Some characteristics:
- It may have some trouble dealing with errors when the real value is 0. When there are some examples like this in your sample, other metrics are recommended such as MASE.
- MAPE has some troubles dealing with values near zero. If these values are expected in your target, you should choose another metric.
You can learn more about MAPE using the following resources:
R-Squared
In fancy terms, R-Squared is the proportion of variance explained by your model.
This seems complicated, but is, in reality, quite easy to understand. Let’s start by looking at the R-Squared Formula:

The top part of the fraction looks really familiar! That’s because it consists of the Sum of the Squared Errors (SSRes, from Sum of Squares of the Residuals) – the worse your algorithm is, the higher this value will be.
Let’s imagine that we have two examples, with predictions consisting of the array [3.5, 3] while the real target consists of values [4, 3], the SSRes between these two examples will be:

What about the denominator of the formula? We calculate the total sum of residuals – in practice, we compare what our error would be if we’ve just predicted the mean of the target, the most naive classifier we can thing of.
For the example above, our "Naive" classifier, would classify all our examples as 3.5, the average between the 4 and 3, the real values of the target. We can calculate the Total Sum of Squares (SSTot) using the following rationale:

Now that we have both SSRes and SSTot, we can plug them in the R-Squared formula:

Our R-Squared consists of a value between 0 and 1 (although, with some anomalies, you may have negative values) – where 1 is a really good model and 0 a model that is random.
Notice that the lower the SSRes(the Sum of Squared Residuals), the higher the R-Squared Value. This means that the lower the error of your predictions, the better your model is when compared with a model that just predicts the mean.
You can learn more about R-Squared using the following resources:
- R For Data Science Udemy course – Metrics Section
- R-Squared Wikipedia Page
- Khan Academy R-Squared article
That’s it! These metrics are some of the most common ones to evaluate Data Science and Machine Learning regression models. You will probably only check some of them during the development process but it never hurts to understand the implications behind evaluating models with different metrics!
Is there any other metric that you commonly useto evaluate your regression models? Write down in the comments below, I would love to hear your opinion!
I’ve set up a course on learning Data Science in a Udemy course – the course is suitable for beginners and I would love to have you around.