The world’s leading publication for data science, AI, and ML professionals.

Linear Discriminant Analysis (LDA) Can Be So Easy

An Interactive Visualisation for You to Experiment With

Image created by Arus Nazaryan using Midjourney. Prompt: "Drone footage of two flocks of sheep, bright blue and deep red, divided by a fence which separates the flocks in the middle, clean, realistic sheep, on green grass, photorealistic"
Image created by Arus Nazaryan using Midjourney. Prompt: "Drone footage of two flocks of sheep, bright blue and deep red, divided by a fence which separates the flocks in the middle, clean, realistic sheep, on green grass, photorealistic"

Classification is a central topic in machine learning. However, it can be challenging to understand how the different algorithms work. In this article, we will make linear discriminant analysis come alive with an interactive plot that you can experiment with. Get ready to dive into the world of data classification!

Interactive plot 👇🏽 Click to add and remove data points, use drag to move them. Change the population parameters and generate new data samples.

If you are applying or studying classification methods, you might have come across various methods such as Naive Bayes, K-nearest neighbours (KNN), quadratic discriminant analysis (QDA) and linear discriminant analysis (Lda). However, it is not always intuitive to understand what the different algorithms are doing. LDA is one of the first approaches to learn and consider and this articles demonstrates how the technique works.

Let’s start from the beginning. All classification methods are approaches to answer the question: Which type of class does this observation belong to?

In the plot above, there are two independent variables, x_1 on the horizontal axis and x_2 on the vertical axis. Think of the independent variables as scores in two subjects, e.g. physics and literature (ranging from 0–20). The dependent value y is the class, in the plot represented as red or blue. Think of it as a binary variable we want to predict, such as whether an applicant is admitted to university ("yes" or "no"). A set of given observations is displayed as circles. Each is characterised by a x_1 and a x_2 value and the class.

Now, if a new data point is added, to which class shall it be assigned?

LDA allows us to draw a boundary to divide the space in two parts (or multiple ones, but two in this case with two classes). For example, below in figure 1 I marked a new data point at (x_1 = 4, x_2 = 14) with a cross. As it falls on the "red side" of the space, this observation would be assigned to the red class.

Figure 1: The new observation at the cross (x_1=4, x_2=14) is assigned to the red class.
Figure 1: The new observation at the cross (x_1=4, x_2=14) is assigned to the red class.

How it works – the math behind it

So given the LDA boundary, we can make classifications. But how can we draw the boundary line using the LDA algorithm?

The line divides the plot where the probability of the red class and the blue class is 50% each. Going to one side, the probability of red is higher, going to the opposite side, blue is more probable. We need to come up with a way to calculate how probable it is for the new observation to belong to each of the classes.

The probability that the new observation belongs to the red class can be written this way: P(Y = red| X = (x_1 = 4, x_2 = 14)). To make it easier to read, instead of x_1 = 4, x_2 = 14, I will in the following just write x which is a vector containing the two values.

According to the Bayes theorem, we can express conditional probabilities as P(Y = red | X = x) = P(Y = red ) * P(X = x | Y = red) / P(X = x). So to calculate the probability that the new observation is "red" given its x values, we need to know three other probabilities.

Let’s go step by step. P(Y = red) and P(X = x) are called "prior probabilities". In the plot, P(Y = red) can be calculated simply by taking the share of all "red" observations of the total number of observations (27,27% in figure 1).

Calculating the prior probability of x, P(X = x), is more difficult. We don’t have any observation with x_1 = 4 and x_2 = 14in the initial dataset, so what is its probability? Zero? Not quite.

P(X = x) can be understood as the sum of the joint probabilities P(Y = red, X = x) and P(Y = blue, X = x) (see also here):

Each of the joint probabilities can be expressed with the Bayes theorem as:

You see the priors P(red) and P(blue) enter again, which we already know how to determine. What is left for us to do, is to find a way to calculate the conditional probabilities P(X = x | Y = red) and P(X = x | Y = blue).

Even if we did never see any red data point at position x in the plot in our original data set, we can find a probability if we regard x to be drawn from a population of a certain form. Usually when using real world data, the population is not directly observable, so we don’t know the true form of the population distribution. What LDA does is to assume that the x values are normally distributed on the x_1 and x_2 axis. With real world data, this assumption can be more or less reasonable. As we are dealing with a generated data set here though, we don’t have to worry about this problem. The data originate from populations which follow a normal (i.e. Gaussian) distribution, characterised by two parameters (per independent variable): ~N(mean, variance).

Using the formula for multivariate normal distributions, we can describe the distribution of the data belonging to class "red" as

where Σ denotes a covariance matrix. LDA uses only one common covariance matrix for all classes, meaning that it assumes that all classes have the same variance(x_1), variance(x_2) and covariance(x_1, x_2).

Now that we’ve got the formula for P (X = x | Y = red), we can plug this in the equation above to get to P (Y = red|X = x). This gives us a monstrous equation, which I’ll skip here. Luckily, it gets simpler from here. Our goal now is to find the dividing line that separates the red and blue zones, i.e. we want to find those points where the probabilities of being class red or blue are equal:

Performing some transformations, it can be shown that minimising this is equivalent to minimising the following equation:

Solving this for x_2, we get a line like

where β_0 and β_1 are the parameters depending on the means µ_red and µ_blue, the common covariance matrix Σ and the prior probabilities of red and blue, P("red") and P("blue").


Exploring the plot

Now that you know how it works, you can explore the plot using different parameters.

When you click "create new sample", the plot displays two boundaries. The Bayes boundary is calculated using the "real" population parameters. The estimated boundary in contrast makes estimates for the parameters based on the sample data.

Some guiding questions for exploration:

  • Using which parameters does the estimated boundary deviate strongly from the Bayes boundary? (see figure 2, left image)
  • How sensitive is the boundary to outliers? (see figure 2, center image)
  • Which parameters give results you did not expect?
Figure 2. Left: The estimated boundary deviates strongly from the Bayes boundary. Center: The boundary adapts to an outlier. Right: The boundary adapts well to the data point moving around.
Figure 2. Left: The estimated boundary deviates strongly from the Bayes boundary. Center: The boundary adapts to an outlier. Right: The boundary adapts well to the data point moving around.

Final remarks

LDA is an important tool to know in the realm of Machine Learning classification methods. However, it is one thing to understand the theory behind LDA through equations and formulas, and another to gain a practical understanding through hands-on exploration. The interactive tool presented here offers a unique opportunity to experiment and understand LDA in a more intuitive way.

Which other statistical methods would you like to see interactively visualised? Let me know in the comments!

Don’t forget to follow me to stay in the loop for further articles!


All images, unless otherwise noted, were created by the author using Observable.

If you want to read up on LDA, have a look into the great book "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani, which you can download for free on their website: https://www.statlearning.com/


Related Articles