Ensemble Learning Techniques

An ensemble can make things simple

Charu Makhijani
Towards Data Science

--

Image Source: pixabay

When you are reading this post about ensemble learning I hope you are familiar with at least a few machine learning models and their implementation. When we train the ML model on any dataset we all face our error companions called Bias and Variance. Most of the time either we are not getting satisfactory accuracy results or the data is overfitting into the model. And then we adjust data again, create/drop features, and re-train the model and this process can repeat several times. The ensemble comes here as a rescue and in the next few sections, I will explain what is ensemble learning, how it relates to Bias and Variance and what are the techniques of ensemble learning.

What is Ensemble Learning?

Ensemble learning is a process where multiple base models (most often referred as “weak learners”) are combined and trained to solve the same problem. This method is based on the concept that weak learner alone performs task poorly but when combined with other weak learners, they form a strong learner and these ensemble models produce more accurate results.

This is the reason that ensemble learning methods are most often trusted in many online competitions.

Ensemble learning is a technique that combines multiple machine learning algorithms to produce one optimal predictive model with reduced variance (using bagging), bias (using boosting) and improved predictions (using stacking).

Ensemble Learning Relation With Bias and variance

The prediction error of an ML model is the sum of-

1. Bias Error — Bias is the difference between the model’s predicted result and the actual result. High Bias means the model is underfitting. Hence we have to use a more complex model.

2. Variance Error — Variance is the model’s sensitivity to small fluctuations in the training data. High Variance means the model is overfitting. Hence we have to either get more training data (if the training data is less) or use a less complex model (if the data is simple for this model)

3. Noise which is often termed an Irreducible error

Managing Bias and variance in a balanced way is the core of Ensemble learning, which can be done using multiple models (mix of simple and complex ones) and combining them without underfitting or overfitting the data.

Simple Ensemble Learning methods

Voting and Averaging Based Ensemble methods are the very simple and easiest forms of ensemble learning. Voting is used for Classification problems and Averaging is used for regression problems.

  1. Averaging

As the name says, in this technique, we take an average of all the model predictions.

For example, if we predict house price, and if there are 3 base models predicting house price as 450000, 500000, and 550000. With Averaging we take the average as (450000+500000+550000)/ 3 = 500000 which is the final prediction.

Let’s see this in the code:

Now, to justify if ensembling (or average prediction here) does a better job than the base models, we will compare the mean absolute error of base models and the final model.

Average Ensembler Mean Absolute Error: 0.48709255488962744
KNN Mean Absolute Error: 0.5220880505643672
Lasso Mean Absolute Error: 0.7568088178180192
SVR Mean Absolute Error: 0.5015218832952784

The Mean Absolute Error of the ensembled model is much less than the individual models.

2. Max Voting Classifier

Max voting is very much the same as average, the only difference is it is used for classification problems. In this technique, predictions from multiple models are collected (which are often referred to as votes) and the prediction from the majority of the models is considered the final prediction.

For example, if the housing price prediction from a set of final prices from multiple models is — 500000, 450000, 600000, 450000, 650000, 450000, 600000. Then with the max voting classifier, the final prediction will be 450000.

Max Voted Ensembler Accuracy: 72.0
KNN Accuracy: 67.2
Logistic Regression Accuracy: 74.4
SVC Accuracy: 70.39999999999999

Sklearn library has a class for Max Voting, called VotingClassifier where you can pass the classifiers list and it will pick the max-voted prediction. Let's see the code-

Sklearn Max Voting Classifier Accuracy: 72.0

3. Weighted Averaging

The weighted average is an extension of averaging. In averaging all base models are given equal importance, but in this technique base model with higher predictive power is given more importance/higher weightage than other base models.

The weights of models are decimal numbers that sum equal to 1.

For example, if 3 models predicted house prices as 450000, 600000, or 650000. And weights for these models are 25%, 50%, and 25%, then the final prediction will be-

0.25*450000 + 0.50*600000 + 0.25*650000 = 575000

Weightage Average Ensembler Mean Absolute Error: 0.46186097145642674
KNN Mean Absolute Error: 0.5220880505643672
Lasso Mean Absolute Error: 0.7568088178180192
SVR Mean Absolute Error: 0.5015218832952784

Conclusion

As we can see in the examples above the simple ensemble techniques can reduce the error and make a huge difference in the final predictions. In the next post, I will share some advanced ensemble techniques like Bagging, Boosting, and Stacking.

To access the complete code for the Simple Ensemble Techniques, please check this GitHub link.

Thanks for the read. If you like the story please like, share, and follow for more such content. As always, please reach out for any questions/comments/feedback.

Github: https://github.com/charumakhijani
LinkedIn:
https://www.linkedin.com/in/charu-makhijani-23b18318/

--

--

ML Engineering Leader | Writing about Data Science, Machine Learning, Product Engineering & Leadership | https://github.com/charumakhijani