The world’s leading publication for data science, AI, and ML professionals.

Aleatoric and Epistemic Uncertainty in Deep Learning

How they differ from each other and how to deal with them with TensorFlow Probability

Source: Pexels
Source: Pexels

Machine learning, including Deep Learning, is inseparably connected to uncertainty occurring in various facets. In this post, we will discuss the difference between aleatoric and epistemic uncertainty, the two categories of uncertainty, especially in the setting of deep learning. We will also give an example of TensorFlow Probability to visualize the two types of uncertainty. Please check the notebook for more details of the code.

Aleatoric vs epistemic uncertainty

One way to tell epistemic uncertainty from aleatoric uncertainty is to check if the uncertainty can be reduced by more knowledge. Indeed, the word "epistemic" comes from the Greek word "επιστήμη" (episteme) which can be roughly translated as knowledge.

An example of sources of these two kinds of uncertainty would be the telling if a person is positive for covid. More clinical trials and medical experiments would reduce the epistemic uncertainty, leading to a better understanding of the virus and a more convincing diagnosis while we can never be 100% sure about our conclusion because of the aleatoric uncertainly caused by random nature.

In the context of deep learning, we would rather regard the classic neural network as a probabilistic model: given an input x, the final layer of the network gives a probability distribution, either on the set of classes in the case of classification or on a point of prediction in the case regression. In this setting, epistemic uncertainty is usually interpreted as the uncertainty of the parameters of the neural network and could be reduced if we have a dataset with the more required information. Aleatoric uncertainty, on the other hand, appears in the probabilistic prediction itself realized by maximizing likelihood inference.

Many researchers have been focusing on quantifying the two kinds of uncertainty in the context of Machine Learning. An idea is to measure the total uncertainty and the aleatoric uncertainty and consider the epistemic uncertainty as the difference between the two [1].

Modelization uncertainty with TFP

In this section, we will use TensorFlow Probability (TFP), a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware to model uncertainty on a simple synthetic dataset. We will see in this example that we can reduce the epistemic uncertainty but not the aleatoric uncertainty.

The figure below shows the dataset used in the code. Here, the orange dots represent only 10% of the full data. In the following, we will train the same models on both the "full dataset" and the "partial dataset", to see if adding more data will help reduce different kinds of uncertainty.

Image by author: data used in the TFP model
Image by author: data used in the TFP model

Aleatoric uncertainty

We consider the aleatoric Uncertainty with the following code:

The model contains only two layers: one classic Dense layer that has two output neurons used as mean and standard deviation values for the following DistributionLambda layer.

We train the model on both the whole dataset and the dataset with fewer points (orange dots in the plot above) and obtain the following result. We can notice at once the model trained on different datasets results in a similar result and adding more data will not reduce the aleatoric uncertainty.

Image by author: Aleatoric uncertainty
Image by author: Aleatoric uncertainty

Epistemic uncertainty

What would happen to epistemic uncertainty? Let us have a look at the following code now:

Different from what we did before, the model here contains a DenseVariational layer. This layer uses variational inference to fit a posterior to the distribution over both the kernel matrix and the bias terms which are otherwise used like a Dense layer. The variational inference is an alternative to the Markov chain Monte Carlo (mcmc) to approximate the posterior distribution in machine learning.

As before, we train the model on both the whole dataset and the dataset with fewer points. We notice in the plot below that the standard deviation is smaller when we train the model on a bigger dataset. In another word, this example shows that more data can reduce epistemic uncertainty.

Image by author: Epistemic uncertainty
Image by author: Epistemic uncertainty

Aleatoric and epistemic uncertainty

For those who are interested, we can put the two kinds of uncertainty in one single model with the following model:

Conclusion

In this post, we discussed the difference between aleatoric and epistemic uncertainty and illustrate such a difference with TFP with a simple example. However, one last remark I would like to say before ending this post is that aleatoric and epistemic uncertainty should not be seen as absolute notions. Changing the context will change the sources of uncertainty and we can sometimes hardly the difference.

Reference

[1] DeVries, T., & Taylor, G. (2018). Learning confidence for out-of-distribution detection in neural networks.


Related Articles