An introduction to the (Gaussian) Naive Bayes model with theory and Python implementation

Contents
This post is a part of a series of posts that I will be making. You can read a more detailed version of this post on my personal blog by clicking here. Underneath you can see an overview of the series.
- Introduction to Machine Learning
- (a) What is machine learning?
- (b) Model selection in machine learning
- (c) The curse of dimensionality
- (d) What is Bayesian inference?
- Regression
- (a) How linear regression actually works
- (b) How to improve your linear regression with basis functions and regularization
- Classification
- (a) Overview of Classifiers
- (b) Quadratic Discriminant Analysis (QDA)
- (c) Linear Discriminant Analysis (LDA)
- (d) (Gaussian) Naive Bayes
- (e) Multiclass Logistic Regression using Gradient Descent
Setup and objective
We’ve looked at quadratic discriminant analysis (QDA), which assumes class-specific covariance matrices, and linear discriminant analysis (LDA), which assumes a shared covariance matrix among the classes, and now we’ll look at (Gaussian) Naive Bayes, which is also slightly different.
If you haven’t read my post on QDA, I highly recommend it, as the derivation for Naives Bayes is the same.
Naive Bayes makes the assumption that the features are independent. This means that we are still assuming class-specific covariance matrices (as in QDA), but the covariance matrices are diagonal matrices. This is due to the assumption that the features are independent.
So, given a training dataset of N input variables x with corresponding target variables t, (Gaussian) Naive Bayes assumes that the class-conditional densities are normally distributed

where μ is the class-specific mean vector, and Σ is the class-specific covariance matrix. Using Bayes’ theorem, we can now calculate the class posterior

We will then classify x into class

Derivation and training
The derivation actually follows the same derivation of the class-specific priors, means, and covariance matrices from QDA. You can find the derivation in my earlier post on QDA here.
The only difference is that we have to set everything but the diagonal to 0 in the class-specific covariance matrices. We therefore get the following

where diag means that we set every value not on the diagonal equal to 0.
Python implementation
The code underneath is a simple implementation of (Gaussian) Naive Bayes that we just went over.
Underneath is a chart with the data points (color coded to match their respective classes), the class distributions that our (Gaussian) Naive Bayes model finds, and the decision boundaries generated by the respective class distributions.

Note that while the decision boundary is not linear as in the case of LDA, the class distributions are completely circular Gaussian distributions, since the covariance matrices are diagonal matrices.
Summary
- Naive Bayes is a generative model.
- (Gaussian) Naive Bayes assumes that each class follow a Gaussian distribution.
- The difference between QDA and (Gaussian) Naive Bayes is that Naive Bayes assumes independence of the features, which means the covariance matrices are diagonal matrices.
- Remember that LDA has a shared covariance matrix, where Naive Bayes has class-specific covariance matrices.