The world’s leading publication for data science, AI, and ML professionals.

Modeling uncertainty in neural networks with TensorFlow Probability

Part 1: An introduction

Hands-on Tutorials

This series is a brief introduction to modeling uncertainty using TensorFlow Probability library. I wrote it as a supplementary material to my PyData Global 2021 talk on uncertainty estimation in neural networks.

Articles in the series:

  • Part 1: An Introduction
  • Part 2: Aleatoric uncertainty
  • Part 3: Epistemic uncertainty
  • Part 4: Going fully probabilistic
Image by Jaymantri https://www.pexels.com/photo/blue-body-of-water-5412/
Image by Jaymantri https://www.pexels.com/photo/blue-body-of-water-5412/

Why model uncertainty?

Let me reverse this question – why not to model uncertainty?

Imagine you’re building a medical diagnosis system. Let’s say your system can predict one of three classes: (A) malignant tumor, (B) benign tumor, (C) no tumor. Your system predicts malignant tumor for your patient Jessie. The prediction has softmax score of 0.9. How sure are you that your prediction is correct?

A softmax distribution for our contrived diagnosis system. The system returned high softmax value for class (A) - malignant tumor.
A softmax distribution for our contrived diagnosis system. The system returned high softmax value for class (A) – malignant tumor.

Would you share this diagnosis with Jessie?

Before you answer this question, let’s add weight uncertainty estimation to our model. Another name for this type of uncertainty is epistemic uncertainty and we’ll talk more about it in the upcoming episodes of this series. Let’s look at our model’s outputs once again.

Left: a softmax distribution for our contrived diagnosis system. Right: a softmax distribution plus 95% confidence intervals over 10000 predictions from a probabilistic version of our model.
Left: a softmax distribution for our contrived diagnosis system. Right: a softmax distribution plus 95% confidence intervals over 10000 predictions from a probabilistic version of our model.

What has just happened? We’ve seen the plot on the left before, but what do we see on the right? In the plot on the right we added 95% confidence intervals over softmax scores obtained by sampling 10.000 samples from a probabilistic version of our model.

Would a result like this be even possible? As counterintuitive as it may seem, the short answer is yes. We’ll see examples of real-world systems behaving in a similar way in the episode on epistemic uncertainty.

Going back to our original question – why (not) model uncertainty? We’ll try to give an informed answer to this question in the last episode of this series. In the meanwhile, I am very curious to hear your thoughts. Please feel free to share them in the comment section below.

Now, let’s have a look at a toolkit that will help us answer our questions regarding uncertainty estimation – TensorFlow Probability.

TensorFlow Probability API reference page. Photo by yours truly.
TensorFlow Probability API reference page. Photo by yours truly.

TensorFlow Probability

TensorFlow Probability (TFP) is a Probabilistic Programming library, a part of a broader TensorFlow ecosystem. It’s not a part of core TensorFlow library, so you need to install and import it separately. Installation guidelines can be found in the documentation.

TFP is a comprehensive library with more than 15 different sub-modules. In this series we’ll focus on two of them:

  • tfp.distributions
  • tfp.layers

Distributions

As its name suggests, tfp.distributions provides us with distribution objects. Currently (Nov 2021), you can find over 100 different distributions there.

Let’s import the libraries and start with something simple. We’ll initialize a normal distribution with mean 100 and standard deviation of 15:

As you can see in the code output above, our distribution object has batch_shape and event_shape attributes. Batch size informs us about a number of distribution objects in a batch. For instance, we might want to have three univariate gaussians in one batch to parametrize three independent output layers in our network. Event shape conveys different type of information. You can think of event shape as the distribution’s dimensionality. Let’s see an example:

As you can see, now we got a distribution with batch_shape of 2. Both – batch and event shapes, might also have higher dimensional shapes, i.e. their shapes might be matrices or tensors.

Now, let’s convert this batch of two normal distributions into a single two dimensional distribution:

In the code above, we wrapped our normal_batched into an instance of a special tfd.Independent class. This class transforms a number of batch dimensions specified by reinterpreted_batch_ndims parameter into event dimension. Batch shape of normal_2d is now empty and the distribution has event shape of 2.

You might have also noticed that normal_batched.mean() and normal_2d.mean() returned virtually identical arrays. Their meaning is different though. In the first case, we got two independent means – one for each distribution in the batch. In the second case we got a single mean vector with two components – one for each of the dimensions of our 2D distribution.

To understand this difference better, let’s look at a couple of basic methods that each TFP distribution offers and try to apply them to our example.

Each TFP distribution object has three basic methods:

  • .sample()
  • .prob()
  • .log_prob()

.sample() allows you to sample from a distribution, .prob() returns a distribution’s density (probability density function – PDF, not probability!) for a given input and .log_prob() returns a log of PDF of your input.

In the code above, we sampled three samples from normal_batched and three samples from normal_2d. Although sample array sizes are the same in both cases, their meanings are different.

We should see this more clearly, when we evaluate the PDF of both distributions. Think for a while what dimensions would you expect from evaluating a PDF of both distributions using a sample of shape (3, 2). Would they be the same? Different? Why? 🤔

Let’s see a very simple example. We’ll take two values – 100 and 200 – and evaluate the PDFs for these points using our two distributions.

As you can see, normal_batched.prob(sample) returned two values, while normal_2d.prob(sample) just one value. Moreover, notice that normal_batched returned the same value twice! Do you know why? If so, share your answer in the comments below.

The batched distribution returned two values, because there are in fact two separate distributions (just enclosed within a single batch element). At the same time, our second distribution is a single distribution with two dimensions and we need two numbers to describe a single point in 2D space.

With this remark we close this week’s episode. In the next episode we’ll talk about aleatoric uncertainty and see how to model it using tfd.layers module.

Image by cottonbro https://www.pexels.com/photo/a-white-line-on-the-asphalt-road-5319516/
Image by cottonbro https://www.pexels.com/photo/a-white-line-on-the-asphalt-road-5319516/

Summary

Congrats on getting that far! 🎉

In this episode of Modeling uncertainty in neural networks with TensorFlow Probability **** series we’ve seen an example of how modeling uncertainty can provide us with additional information about our model performance.

We’ve experimented with basic yet powerful tools from TFP toolbox. We explored basics of distributions sub-module, we saw how to transform a batched distribution into a multidimensional distribution using tfd.Independent. Finally, we explored basic distribution methods: .sample() and prob().

In the next part we’ll focus on aleatoric uncertainty. We’ll see tfd.layers sub-module in action and we’ll understand the power of .log_prob() method.

Thank you for reading and see you in Part 2!


❤️ Interested in getting more content like this? Join using this link:

Join Medium with my referral link – Aleksander Molak



Related Articles