Addressing Overfitting

Today, we’re continuing from Part 1 of the "Addressing the problem of overfitting" article series. Regularization is another useful technique that can be used to mitigate overfitting in Machine Learning models. Today, more emphasis will be given to discuss the intuition behind regularization instead of discussing its mathematical formulation. In this way, you can get a clear idea of the effect of applying regularization to machine learning models.
In general, the term "regularization" means limiting/controlling. In the context of machine learning, regularization deals with model complexity. It limits the model complexity or limits the learning process of the model during the training phase. Generally, we prefer simple and accurate models because complex models are more likely to be overfitting. By limiting the model complexity, overfitting tries to keep the models as simple as possible while these models still make accurate predictions.
There are two ways to apply regularization to machine learning models:
- By adding another term to the loss function that we’re trying to minimize. Now, the objective function consists of two parts: loss function and regularization term:

- By early stopping the learning process during the training phase. The perfect example of this method is stopping the growth of a decision tree at an early stage.
We’ll discuss each method with examples by writing Python code. In the end, you’ll get hands-on experience to apply regularization to the models discussed here.
Method 1: Adding a regularization term to the loss function
Here, we’ll build a logistic regression model on a dataset called "heart_disease.csv". First, we build the model without adding the regularization term (penalty=none) and see the output.

The model is already good. Let’s see whether we can further improve the test accuracy by adding a regularization term. Here, we add L2 regularization (penalty=l2) to the model.

After adding the regularization term, the test accuracy has been increased by 3%. The number of false positives has also been reduced. Therefore, the model is able to generalize for new unseen data (test dataset).
Method 2: Early stopping the learning process
Here, we’ll build a decision tree classification model on the same dataset. First, we’ll build the full decision tree without limiting the growth of the tree (i.e. without applying regularization / without early stopping the learning process) and see the output.

We got very high training accuracy and low test accuracy. Therefore, the model is clearly Overfitting. This is normal because we allowed the tree to be fully grown. Now, we’re trying to limit the growth of the tree. This is done by setting the optimal values for max_depth, min_samples_leaf and min_samples_split hyperparameters (In the above model, these values have been set to their defaults). In machine learning, limiting the growth of the tree is technically called pruning.

The model has been significantly improved (an increase of 16% test accuracy!).
Note: Finding the optimal values for a set of hyperparameters is technically called hyperparameter tuning or hyperparameter optimization. If you do not know this procedure, you can learn that by reading the following two posts written by me:
k-fold cross-validation explained in plain English
4 Useful techniques that can mitigate overfitting in decision trees
Conclusion
Overfitting often happens in model building. Regularization is another useful technique to mitigate overfitting. Today, we’ve discussed two regularization methods with examples. Note that the early stopping method can be applied to algorithms like logistic regression, linear regression by specifying a lower value for the maximum iterations.
The terms "regularization" and "generalization" must not be confused. Regularization is applied to achieve generalization. A regularized model will generalize well for unseen data. It learns the important patterns in the data and makes accurate predictions on unseen data rather than memorizing the noise in the dataset.
Update (2021–09–27): Part 3 is now available! [How to Mitigate Overfitting with Dimensionality Reduction]
This is the end of today’s post. My readers can sign up for a membership through the following link to get full access to every story I write and I will receive a portion of your membership fee.
Sign-up link: https://rukshanpramoditha.medium.com/membership
Thank you so much for your continuous support! See you in the next story. Happy learning to everyone!
Special credit goes to Kayaan Udachia on Unsplash, **** who provides me with a nice cover image for this post (I did some modifications to the image: added some text on it and removed some parts).
Rukshan Pramoditha 2021–09–24