The world’s leading publication for data science, AI, and ML professionals.

How to Mitigate Overfitting with K-Fold Cross-Validation

Addressing the problem of overfitting – Part 1

Addressing Overfitting

Photo by Joshua Sukoff on Unsplash
Photo by Joshua Sukoff on Unsplash

Overfitting happens almost every time when we’re building Machine Learning models, especially with tree-based models such as decision trees. It is not possible to completely avoid the problem of overfitting. However, we can try out some standard techniques to mitigate overfitting. There are several such techniques to discuss. They cannot be completed in one post. So, I’ve decided to discuss them one by one. This is part 1 in which we discuss how to mitigate overfitting with k-fold cross-validation. This part also makes the foundation for discussing other techniques.

It should be our ultimate goal to build a model that performs well on unseen data (test data) as well as on training data. This kind of model is just perfect. On the other hand, when your model overfits the data, it will give a much higher performance score (e.g. 100% accuracy) on training data and a lower performance score on unseen data (test data). In this case, it fails to generalize for new unseen data (test data). When this happens, the model tries to memorize the noise in training data instead of trying to learn important patterns in the data. This is equal to the real-life situation in which a student is trying to face the final exam by memorizing the answers of the practice exams instead of applying the knowledge got from the practice exams. The student will not be able to score well on the final exam!

Training without k-fold cross-validation

We’ll build a decision tree classification model on a dataset called "heart_disease.csv" without doing k-fold cross-validation. Then, we get the train and test accuracy scores with the confusion matrix. By looking at those outputs, we can decide whether the model is Overfitting or not.

(Image by author)
(Image by author)

This time, our model is clearly overfitting. How do we decide that? The model performs really well on the training set (100% train accuracy!) and does not perform well on test data (only 71% accuracy).

Training with k-fold cross-validation

Now, we’ll train the same decision tree model, but this time, with k-fold cross-validation.

(Image by author)
(Image by author)

This time, our model may still be overfitting. However, the accuracy (actually the test accuracy) has been increased by 6%.

The insights

Actually, k-fold cross-validation does not mitigate overfitting by itself. However, it helps us to detect plenty of options (we have room to increase the model’s accuracy) to mitigate overfitting. When combing k-fold cross-validation with a hyperparameter tuning technique like Grid Search, we can definitely mitigate overfitting. For tree-based models like decision trees, there are special techniques that can mitigate overfitting. Several such techniques are: Pre-pruning, Post-pruning and Creating ensembles. If you want to learn them in detail, read my article:

4 Useful techniques that can mitigate overfitting in decision trees

The reason why the accuracy score has been increased by 6% after applying k-fold cross-validation is that the cross-validation procedure has averaged out 10 sets of accuracy scores by splitting the dataset into 10 different folds (specified as cv=10). In this way, the model sees different types of instances (data points) at each training stage. Therefore, the model has much different data to learn new patterns. So, it will generalize well for new unseen data (test data). That’s why we’ve seen an increase in the accuracy score.

In the first case in which we built our mode without doing k-fold cross-validation, the model saw only one fold of random data specified as random_state=0.

If you don’t understand the last couple of paragraphs, don’t worry! You can read the following article in which I’ve explained almost everything related to k-fold cross-validation:

k-fold cross-validation explained in plain English

Update (2021–09–24): Part 2 is now available! [How to Mitigate Overfitting with Regularization]


This is the end of today’s post. My readers can sign up for a membership through the following link to get full access to every story I write and I will receive a portion of your membership fee.

Sign-up link: https://rukshanpramoditha.medium.com/membership

Thank you so much for your continuous support! See you in the next story. Happy learning to everyone!

Special credit goes to Joshua Sukoff on Unsplash, **** who provides me with a nice cover image for this post (I did some modifications to the image: added some text on it and removed some parts).

Rukshan Pramoditha 2021–09–21


Related Articles