The world’s leading publication for data science, AI, and ML professionals.

Implementing Naive Bayes From Scratch

Coding the inner workings with just Python and NumPy

From Scratch

Photo by Mike Hindle on Unsplash
Photo by Mike Hindle on Unsplash

I don’t like "Blackboxes". I have a deep personal urge to know how things work. I want to touch it. I want to tinker with it. I want to code it myself, even if there already exists a plug-and-play solution. And this is exactly, what we are going to do in this article.

In the following sections, we will implement the Naive Bayes Classifier from scratch in a step-by-step fashion using just Python and NumPy.

But, before we get started coding, let’s talk briefly about the theoretical background and assumptions underlying the Naive Bayes Classifier.


Naive Bayes Quick Theory

The underlying principle of the Naive Bayes Classifier is Bayes’ theorem – hence, the name. In our case, we can state Bayes’ theorem as the following:

Our overall goal is to predict the conditional probability of a class with given data. This probability can also be called the posterior belief. So how do we calculate the posterior?

First of all, we need to determine the likelihood of data belonging to a certain class distribution P(Data|Class). Then we need to multiply that by the prior P(Class). In order to calculate the prior, we need to count the number of samples (rows) for a specific class and divide that by the number of total samples in the dataset.

Note: We can omit the denominator, to simplify our calculation, since P(Data) can be treated as a normalizing constant. However, we will no longer receive a fraction for the probability ranging from zero to one.

So what is so naive about Naive Bayes, you might ask?

One crucial assumption Naive Bayes makes, is the independence of features. This means, that the occurrence of one event doesn’t affect the occurrence of the other event. Therefore, all interactions and correlations among the features will simply be ignored. Due to this simplifying premise, we are now able to apply the multiplication rule when calculating the probability of a certain class with multiple features.

And this is basically all we need to know to implement the Naive Bayes’ Classifier from scratch.

Deriving Bayes’ Theorem The Easy Way

Understanding Probability Models and Axioms


General Overview

Now that we briefly talked about the theoretical background, we can think about the different steps we need to implement. This provides us with a high-level overview, we can use as some kind of blueprint.

  1. Fit: Calculate the summary statistics and the prior for each class in the (training) dataset
  2. Predict: Calculate the probability of every class for each sample in the (test) dataset. Therefore, get the probability of data given the classes’ (Gaussian) distribution and combine it with the prior.

In the following, we will implement just a single class. The skeleton code, which we will complete step-by-step in the next section, can be seen below.


Implementation From Scratch

Fitting the data

As stated in the general overview, we need to calculate the summary statistics for each class (and feature) as well as the prior.

First of all, we need to gather some basic information about the dataset and create three zero-matrices to store the mean, the variance, and the prior for each class.

Next, we iterate over all the classes, compute the statistics and update our zero-matrices accordingly.

For example, assume we have two unique classes (0,1) and two features in our dataset. The matrix storing the mean values, therefore will have a two rows and two columns (2×2). One row for each class and one column for each feature.

The prior is just a single vector (1×2), containing the ratio of a single classes’ samples divided by the total sample size.

An example of a summary statistic [Image by Author]
An example of a summary statistic [Image by Author]

Making a prediction

Now, the slightly more complicated part…

In order to make a prediction, we need to get the probability of data belonging to a certain class or more specifically, coming from the same distribution.

To make our life easier, we assume that the data’s underlying distribution is gaussian. We create a class method, which returns the probability for a new sample.

Gaussian function. μ = mean; σ² = variance; σ = standard deviation.
Gaussian function. μ = mean; σ² = variance; σ = standard deviation.

Our method receives a single sample and calculates the probability. However, as we can tell from the parameters, we need to provide the mean and variance as well.

Therefore, we create another class method. The method iterates over all classes, collects the summary statistics, the prior, and calculates the new posterior belief for a single sample.

Note, that we apply log-transformation, in order to simplify the calculations by enabling addition of probabilities. We also return the class index with the highest posterior belief.

Finally, we can tie it all together with the predict method.


Testing our Classifier

Now that we finished our Naive Bayes Classifier, there is just one thing left to do – we need to test it.

We will use the iris dataset, which consists of 150 samples with 4 different features (Sepal Length, Sepal Width, Petal Length, Petal Width). Our goal is to predict the correct class among 3 different types of irises’.

An Overview of the iris dataset by plotting the first two features [Image by Author]
An Overview of the iris dataset by plotting the first two features [Image by Author]

With running the code above we load and prepare the iris dataset and train our classifier. When predicting on the test data, we achieve ~ 96,6% accuracy.

The confusion matrix below, tells us that our classifier made one mistake, falsely classifying class one as two.

Confusion matrix (number of predictions) [Image by Author]
Confusion matrix (number of predictions) [Image by Author]

Conclusion

In this article, we implemented a Naive Bayes Classifier from scratch using just Python and NumPy. We learned about the theoretical background and had the opportunity to apply Bayes’ theorem in a practical manner.

If you don’t like "blackboxes" and want to fully understand an algorithm, implementing it from scratch is one of the best ways to gain intimate and deep knowledge about the inner workings.

You can find the full code here on my GitHub.

ML __ Algorithms From Scratch

Thank you for reading! Make sure to stay connected & follow me here on Medium, Kaggle, or just say ‘Hi’ on LinkedIn


Enjoyed the article? Become a Medium member and continue learning with no limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Join Medium with my referral link – Marvin Lanhenke


Related Articles