
Why model validation is important
A model trained without validation might overfit the test data. This is likely to occur if we are working with only two sets of data, training and testing data.

To mitigate Overfitting the test data, you will need validation. In the rest of the article, we will discuss different validation approaches, from single cross-validation (least robust) to different variants of k-folds cross-validation (more robust).
Single Cross-Validation
This is the simplest approach of validation that you can perform. It involves the use of three sets of data: the training data, validation data and the test data. A typical split is 80% for training, 10% for validation and 10% for test.
If you have N candidate models(N is 4 in our example from image below) that you want to evaluate, those models will be trained using the training data, and evaluate them using the validation data. The process of evaluation is hyperparameter tuning. Once the best possible model is found it will be run on the test data in order to have an idea of how well it will perform in the real world. In overall, we have N training processes, N validation processes and only 1 test process.

The problem of this approach is that after the split, data in the validation set will never get used to train the model. Also, due to the random nature of the training and validation split, you may pick a split that just happens to give a very poor result, or maybe a great result. We are going to see how to apply more robust techniques to alleviate these problems.
Different Variants of K-folds Cross-Validation
For each candidate model, these approaches consist of repeatedly training and validate using different subsets of the training data. This is a very robust data, and does not waste data, because each observation is used for both training and validation. Using this approach with N candidate models and K folds, you will run NxK training and N validation processes, but only 1 test process.
Leave One Out Cross-Validation
Assuming that you have a training data of size N, when using this approach, you start by holding out the first observation, train your model on the remaining N-1 observations and evaluation it on the holdout observation. Then, holdout the second observation, train the model on the N-1 remaining observations and evaluate the model on the hold out (2nd observation). The same process goes for the third observation, and so on and so forth. At the end of the process, you will have held out each observation exactly once, and will get N evaluation values. The final evaluation metrics is the average of the N values.

The benefit of this technique is that it ensures that every observation is used in both training and evaluation. However, it can be very time-consuming and computationally expensive, because only one model is being trained N times. Also, if you have M candidate models, each one will be trained and validated on all these splits which adds a lot more computation time.
K-Folds Cross-Validation
This approach is similar to Leave One out Cross-Validation, but greatly improves computation time, without losing much, in terms of model performance. Instead of holding out one observation at the time it splits the training data into an equal number of data (known as folds).

The training data is split into 5 different data of the same size, in this example we are performing a 5-folds cross-validation. At each iteration, the model is trained on 4/5 of the data and validated on the remaining 1/5
- From split 1, fold 1 to 4 are used to train the model and fold 5 is used to validate it.
- From split 2, fold 1 to 3 and 5 are used to train the model and fold 4 is used to validate it.
- The same process is performed till split 5, and this for all the models.
Each observation in the data is used to validate exactly once, and the overall performance for a single model is obtained by averaging all the performance from all the splits. At the end of the training of all the models, the best one is the one with the highest performance value (the lowest error value). The best model is then run on the test data to get an estimation of its performance on unseen data.
The K-folds improves the Leave One Out cross-validation, but none of them take into account the classes/labels while splitting the training data into different folds, and this can be a drawback since some labels might not be represented in some folds.
Stratified K-Folds Cross-Validation
This approach takes into account the labels/class of the training data while splitting into folds, and ensure that each fold has a representation of different classes. Let assume that the original data is categorized into "malaria" and "malaria", each fold will have approximately representative mix of "malaria" or "not malaria" as exists in the entire data.
Assuming that the training data has 100 observations, with 80 "malaria" and 20 "not malaria". If you decide to apply a 10-folds validation, every fold will have 8 "malaria" and 2 "not malaria" approximately
End of article
I hope you enjoyed your journey through this article. If you have any questions or remarks, I will be glad to welcome them for further discussions. For further readings do not hesitate to consult the following link:
https://scikit-learn.org/stable/modules/cross_validation.html
Bye for now 🏃🏾