
Imagine a situation where we have a test set and a training set, and we apply two different Machine Learning models to these sets. How do we decide which model performs better? We train the model on the training set and evaluate the model using the test set. Assume model A performs better than model B on the test set because of the bias in splitting the data. But in reality, model B is the superior model. That is where we can use Cross-Validation.
Contents
- Cross-Validation
- K-fold Cross-Validation
- Monte Carlo Cross-Validation
- Differences between the two methods
- Examples in R
- Final thoughts
Cross-Validation
Cross-Validation (we will refer to as CV from here on)is a technique used to test a model’s ability to predict unseen data, data not used to train the model. CV is useful if we have limited data when our test set is not large enough. There are many different ways to perform a CV. In general, CV splits the training data into k blocks. In each iteration, the model trains on k-1 blocks and is validated using the last block. We can use multiple iterations of CV to reduce variability. We use the average error over all the iterations to evaluate the model.
It is always preferred to use a model with better CV performance. Similarly, we can also use CV to tune model parameters.
K-fold Cross-Validation
Steps:
- Split training data into K equal parts
- Fit the model on k-1 parts and calculate test error using the fitted model on the kth part
- Repeat k times, using each data subset as the test set once. (usually k= 5~20)

Monte Carlo Cross-Validation
Also known as repeated random subsampling CV
Steps:
- Split training data randomly (maybe 70–30% split or 62.5–37.5% split or 86.3–13.7%split). For each iteration, the train-test split percentage is different.
- Fit the model on train data set for that iteration and calculate test error using the fitted model on test data
- Repeat many iterations (say 100 or 500 or even 1000 iterations) and take the average of the test errors.
Note – the same data can be selected more than once in the test set or even never at all.

Differences between the two methods

Examples in R
Let us try and illustrate the difference in the two Cross-Validation techniques using the handwritten digits dataset. Instead of choosing between different models, we will use CV for hyperparameter tuning of k in the KNN(K Nearest Neighbor) model.
For this example, we will subset the handwritten digits data to only contain digits 3 and 8. We then apply the KNN model to differentiate between the two. We will use CV to choose the best value of k from 1,3,5,7,9 and 11.
K Nearest Neighbors (KNN), Crash course version
KNN classification gives the class of the object-based the most common class among the k nearest neighbors of the object.
Choose k using Monte Carlo CV
Here CV runs through 1000 iterations (B), randomly selecting n1 observations for the training set and retrieving the Testing error from the remaining. After the iterations for each value of k in our trial set, we get 1000 test errors. We assign the mean test error to each k value and gauge its effectiveness based on the mean error.
The box plots below show the variation and range in the test errors of the KNN models with different k values. The Dark blue line in each box plot is the mean error of the model. That is what we are focusing on.

The matrix of the graph below gives the running average test errors for the CV, B= 1000. They show that the errors for the models are converging to a particular value. But for lower values of B, we see a lot of fluctuations in the running average error.

Choose k using K-fold CV
For the K-fold, we use k=10 (where k is the number of folds, there are way too many ks in ML). For each value of k tried, the observations will be in the test set once and in the training set nine times.
Average Test Error for both CVs
We finally average the test error from both Monte Carlo and K-fold CV and make a comparison to choose the hyperparameter k. It looks like building a KNN model with k=3 would give the best results in this example.

Final thoughts
Which CV technique to use depends on the situation. The Monte Carlo method can give you more confidence in your results and is more repeatable since the variance is low. But the Monte Carlo CV will have a higher bias than the K-fold CV. This dilemma is common in machine learning and is called the Bias-Variance tradeoff. In most cases, K-fold CV will be good enough and is computationally less expensive.
For the full code and data, follow the link below.