Introduction
In the field of Machine Learning, there’s not a single method that is guaranteed to be the most optimal when applied to different contexts and problems.
A specific method may work best for one particular dataset or problem but the same method may be too weak when applied to a different dataset. Selecting the most appropriate method is one of the most challenging parts of Machine Learning model development.
In today’s article we are going to discuss about a few concepts that you should always have in mind when it comes to evaluating and selecting Machine Learning models in the regression setting.
How to evaluate model performance
In order to evaluate a particular model, we obviously need some sort of quantitative measure that will somehow tell us how the model predictions compare against the actual data.
Eventually, this measurement will also help us compare various models so that we can then select the one that performs best, based on the specific project criteria.
Now depending on the specific approach we want to evaluate, we may have to use a measure capable of quantifying its performance.
Mean Squared Error
In regression setting, one of the most commonly used measures is the Mean Squared Error (MSE) which is defined by the equation given below.

where
- n is the number of observations
- y is the actual value
- f(x) is the predicted value for the _i_th observation
Lower MSE value indicate stronger model performance meaning that the predicted and actual values are closer while higher MSE indicates weak model performance.
Which observations to take into account
When computing the MSE over the training observations we usually refer to it as the training MSE. However, we are mostly interested in evaluating the model with respect to its ability to perform predictions on unseen data (i.e. the test data that includes observations not seen during the model training phase).
Therefore, when evaluating models in regression we need to observe the test MSE that will essentially tell us how well the model performs on future data. Even though it may still be useful to measure training MSE we will need to choose the model that gives the lowest test MSE (assuming this is the only metric you really care about).
Then if we are going to take into account only the test MSE why bother measuring training MSE? The answer is fairly simple. If a particular model has small training MSE but at the same time significantly larger test MSE then the model is possibly overfitted.
Overfitting occurs when the model follows too strictly the training observations, reaching a point that it even learns the noise and thus it then fails to generalise well to new, unseen data points.
Final Thoughts
Model assessment is one of the most important yet challenging stages in Machine Learning model development process. In today’s short guide we discussed about Mean-Squared error which is among the most commonly used measures when it comes to evaluating models in the regression setting.
Additionally, we discussed about the difference between training and test MSE as well as the role that each of them plays when it comes to evaluating model performance and how a big gap between these values could possibly indicate overfitting.
Note that when we have to deal with a large number of observations, we could also compute the Average Squared Error.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.
You may also like