The world’s leading publication for data science, AI, and ML professionals.

A Mathematical Explanation of AdaBoost in 5 Minutes

A thorough explanation of AdaBoost with an example

Image Created by Author
Image Created by Author

Table of Content

  1. Introduction
  2. What Makes AdaBoost Different
  3. An Example of How AdaBoost Works
  4. How a New Point is Assessed

Introduction

AdaBoost, or Adaptive Boost, is a relatively new Machine Learning classification algorithm. It is an ensemble algorithm that combines many weak learners (decision trees) and turns it into one strong learner. Thus, its algorithm leverages bagging and boosting methods to develop an enhanced predictor.

If these words are confusing to you, don’t worry. In this article, we’ll go through a simple example to show how AdaBoost works and the math behind it.

What Makes AdaBoost Different

AdaBoost is similar to Random Forests in the sense that the predictions are taken from many decision trees. However, there are three main differences that make AdaBoost unique:

  1. First, AdaBoost creates a forest of stumps rather than trees. A stump is a tree that is made of only one node and two leaves (like the image above).
  2. Second, the stumps that are created are not equally weighted in the final decision (final prediction). Stumps that create more error will have less say in the final decision.
  3. Lastly, the order in which the stumps are made is important, because each stump aims to reduce the errors that the previous stump(s) made.

An Example of How AdaBoost Works

Let’s look at an example now. Suppose we have the sample data below, with three features (x1, x2, x3) and an output (Y). Note that T = True and F = False.

Step 1: Assign a sample weight for each sample

Using the equation above, calculate the sample weight for each sample. For the first round, the sample weight will be equal. In this example, the sample weight for each sample will be equal to 1/6.

Step 2: Calculate the Gini Impurity for each variable

The next step is to calculate the Gini Impurity for each variable. This is done to determine which variable to use to create the first stump. The formula to calculate the Gini Impurity of each node is as follows:

Once you calculate the Gini Impurity of each node, the total Gini Impurity for each variable is the weighted average of the impurities of each node.

To show an example, let’s calculate the Gini Impurity of x2.

Above is a consolidated table of the samples that shows the number of samples that fit into each category (whether x2 and Y are T or F).

Next, we can calculate the Gini Impurity of each leaf node for x2.

Once the Gini Impurity is calculated for each leaf node, the Total Gini Impurity can be calculated by taking the weighted average of the two individual impurities.

Thus, the Gini Impurity for x2 = 0.25.

If you do that for each variable, you’ll get that x2 has the lowest Gini Impurity, so x2 will be used to create the first stump.

Step 3: Calculate the Amount of Say for the stump that was created

Next, we’ll use the total error to calculate the ‘Amount of say’ that this stump gets.

Total Error is equal to the sum of the weights of the incorrectly classified samples. Since one of the samples was incorrectly classified for x2, the total error is equal to 1/6.

Once you know the total error, you can then calculate the amount of say:

Thus for this stump…

Step 4: Calculate the new sample weights for the next stump

Next, we’re going to increase the sample weights of samples that were incorrectly classified and decrease the sample weights of samples that were correctly classified using the following equations:

Thus with the equations above, we’re able to calculate the new sample weights. Since the total of the sample weights equaled 0.84, we normalized the sample weights by dividing each weight by 0.84 so that the sum of the new sample weights equals 1.

Step 5: Create a bootstrapped dataset with the odds of each sample being chosen based on their new sample weights.

In this step, we’re going to randomly choose 6 samples with replacement from the dataset, with the odds of picking each based on their new sample weight.

Notice how the one that was incorrectly classified has a weight that’s more than double the others. This means that it is more likely to be selected multiple times, and thus, the next stump will focus more on classifying the misclassified sample correctly. This is the power of AdaBoost!

Once the new bootstrapped dataset is created, the samples are given equal weights again, and the process is repeated.

Step 6: Repeat the process n number of times

Lastly, this process is repeated until n number of stumps are created, each with its own amount of say. Once this is done, the model is complete and new points can be classified.

New points are classified by running them through all of the stumps and seeing how they’re classified. Then, the amount of say is summed for each class, and the class with the higher amount of say is the classification of the new point.

Thanks for Reading!

By the end of this, you should understand how an AdaBoost model is created and you should know the math behind it as well.

Terence Shin

Founder of ShinTwin | Let’s connect on LinkedIn | Project Portfolio is here.


Related Articles