The world’s leading publication for data science, AI, and ML professionals.

(Gaussian) Naive Bayes

An introduction to the (Gaussian) Naive Bayes model with theory and Python implementation

Illustration of decision boundaries from a (Gaussian) Naive Bayes model. Image by author.
Illustration of decision boundaries from a (Gaussian) Naive Bayes model. Image by author.

Contents

This post is a part of a series of posts that I will be making. You can read a more detailed version of this post on my personal blog by clicking here. Underneath you can see an overview of the series.

  1. Introduction to Machine Learning
  1. Regression
  1. Classification

Setup and objective

We’ve looked at quadratic discriminant analysis (QDA), which assumes class-specific covariance matrices, and linear discriminant analysis (LDA), which assumes a shared covariance matrix among the classes, and now we’ll look at (Gaussian) Naive Bayes, which is also slightly different.

If you haven’t read my post on QDA, I highly recommend it, as the derivation for Naives Bayes is the same.

Naive Bayes makes the assumption that the features are independent. This means that we are still assuming class-specific covariance matrices (as in QDA), but the covariance matrices are diagonal matrices. This is due to the assumption that the features are independent.

So, given a training dataset of N input variables x with corresponding target variables t, (Gaussian) Naive Bayes assumes that the class-conditional densities are normally distributed

where μ is the class-specific mean vector, and Σ is the class-specific covariance matrix. Using Bayes’ theorem, we can now calculate the class posterior

We will then classify x into class

Derivation and training

The derivation actually follows the same derivation of the class-specific priors, means, and covariance matrices from QDA. You can find the derivation in my earlier post on QDA here.

The only difference is that we have to set everything but the diagonal to 0 in the class-specific covariance matrices. We therefore get the following

where diag means that we set every value not on the diagonal equal to 0.

Python implementation

The code underneath is a simple implementation of (Gaussian) Naive Bayes that we just went over.

Underneath is a chart with the data points (color coded to match their respective classes), the class distributions that our (Gaussian) Naive Bayes model finds, and the decision boundaries generated by the respective class distributions.

Charts of the data points with their respective classes color coded, the class distributions found by our (Gaussian) Naive Bayes model, and the resulting decision boundary from the class distributions. Image by author.
Charts of the data points with their respective classes color coded, the class distributions found by our (Gaussian) Naive Bayes model, and the resulting decision boundary from the class distributions. Image by author.

Note that while the decision boundary is not linear as in the case of LDA, the class distributions are completely circular Gaussian distributions, since the covariance matrices are diagonal matrices.

Summary

  • Naive Bayes is a generative model.
  • (Gaussian) Naive Bayes assumes that each class follow a Gaussian distribution.
  • The difference between QDA and (Gaussian) Naive Bayes is that Naive Bayes assumes independence of the features, which means the covariance matrices are diagonal matrices.
  • Remember that LDA has a shared covariance matrix, where Naive Bayes has class-specific covariance matrices.

Related Articles