The world’s leading publication for data science, AI, and ML professionals.

Uncertainty Visualization Made Easy With Hypothetical Outcome Plots

Use animations to present the uncertainty vividly.

Photo by Riho Kroll on Unsplash
Photo by Riho Kroll on Unsplash

For data scientists, effectively communicating the uncertainty of our analysis to stakeholders is crucial for reliable decision-making.

I’ve been working on uncertainty quantification analysis for some time. From numerous presentations I made for various audiences, I learned that despite the usual uncertainty visualization techniques such as box plots, violin plots, confidence bands, etc., are compact and precise in displaying uncertainty, they may only resonate among trained statisticians.

For broader audiences, including stakeholders, domain experts, etc., who don’t necessarily have a strong background in statistics, they often found those plots complicated and confusing, rendering them bad choices for getting the idea across.

Over the past few years, by trial-and-error, I’ve learned the trick to address this communication issue: instead of using static visualizations of uncertainty, I could make uncertainty presentation more vivid by using animations. More specifically, I create animations to cycle through a number of different plots, each of which simulates one possible scenario drawn from the outcome distribution.

It’s a simple idea, yet a powerful one. My audiences can now "experience" first-hand the uncertainty, which would otherwise be hard and abstract to interpret.

This animation way of presenting uncertainty has a formal name: hypothetical outcome plots, which was first proposed by Jessica Hullman and co-workers at Midwest Uncertainty (MU) Collective.

In this article, I’d like to share with you three examples to showcase this visualization approach. Those case studies are derived (simplified) from my previous projects:

· 1. Regression Analysis where hypothetical outcome plots help the audience to see alternative trend curves supported by the noisy training data;

· 2. Projectile Motion Problem where hypothetical outcome plots help the audience to sense the variation in shooting range induced by uncertain shooting conditions;

· 3. Battery Prognostic Analysis where hypothetical outcome plots help the audience to understand the battery failure risks with different cycle numbers.

If you want to know how those animations are created in Python, please check out the companion Jupyter Notebooks I’ve created.

Ok, let’s get it started!


1. Regression Analysis

In this case study, our goal is to construct a statistical model to approximate a function f(x) given the noisy training data (shown below). Specifically, we use a supervised machine learning technique called Gaussian Process (GP) to train the desired statistical model.

Fig. 1 The noisy training data. (Image by Author)
Fig. 1 The noisy training data. (Image by Author)

Naturally, fitting a curve to the noisy data has uncertainty, which is commonly shown in the form of a confident band. In the figure below, a credibility interval derived from the GP model’s posterior distribution is shown to visualize the uncertainty.

Fig. 2 Confidence band of the GP prediction. (Image by Author)
Fig. 2 Confidence band of the GP prediction. (Image by Author)

Although the confidence band is a popular way in academia, for a broad audience, this static uncertainty visualization can be misleading in at least two ways:

  1. people tend to perceive the above credibility interval as the maximum/minimum bounds of the model prediction;
  2. the correlation between predictions at different 𝑥 locations is not visible. As a result, people may think that the displayed credibility band is a result of simply moving the regression curve up and down.

To address those two issues, we could use hypothetical outcome plots instead. Simply put, we can create an animation to cycle through a number of random draws of possible GP regression curves.

The above animation allows the audiences to experience the GP prediction uncertainty first-hand. Instead of showing the confusing credibility band, hypothetical outcome plots illustrate vividly the alternative regression curves supported by the given training data.

2. Projectile Motion Problem

In this case study, our goal is to visualize the uncertainty of the shooting range of a projectile.

To give it some context, the following sketch illustrates the physics of the problem, where the shooting range R is determined by _v_₀, the initial velocity, θ, the shooting angle, and g=9.8m/s², the gravity acceleration. The considered physical model is a simple one. But it is good enough for our demonstration purpose.

Fig. 4 Sketch of the projectile motion problem. (Image by Author)
Fig. 4 Sketch of the projectile motion problem. (Image by Author)

In reality, we may not be sure about the values of _v_₀ and θ. As a result, the output R is also uncertain. To properly quantify the uncertainty of the shooting range R, we need to perform an uncertainty propagation analysis, which involves specifying probability distributions to the uncertain inputs (_v_₀ and θ) and conducting Monte Carlo simulations to estimate the probability distribution of the output (R). For more details, feel free to check my previous post:

Using Monte Carlo to quantify the model prediction error

Commonly, the results of the above uncertainty analysis are visualized via a histogram, as shown below. It tells you everything about how likely the shooting range __ value may fall within a specific range. It’s all good, except it’s super dull. Can we at least make the visualization cool enough to match with the fancy uncertainty analysis?

Fig. 5 Histogram of the shooting range. (Image by Author)
Fig. 5 Histogram of the shooting range. (Image by Author)

Sure we can. How? By using hypothetical outcome plots:

Using animations help us to clearly see the possible projectile motion trajectories induced by uncertain _v_₀ and θ. This is a much more intuitive, effective, and fun way to show the uncertainty analysis results.

3. Prognostic Analysis

In this case study, we focus on visualizing the uncertainty of the degradation evolution of a battery. In particular, we will see how hypothetical outcome plots can highlight the risk of battery failure after a certain usage time.

This case study has its root in predictive maintenance, where a primary goal is to forecast the remaining useful life (RUL) of a product in service. In this case, we look at a battery, whose capacity degrades as it going through many charging-discharging cycles.

A popular way to deliver this prognostic analysis is by combining physical models with historically measured degradation data: the measured data is firstly used to calibrate (via Bayesian statistics) the unknown parameters of the physical model. Later on, the calibrated parameters are plugged into the model to predict the degradation evolution and estimate when battery failure happens, i.e., its capacity drops below a given threshold.

In general, the degradation prediction will be uncertain once we consider the posterior distribution of the physical model parameter. As a result, it is of particular interest to calculate the battery failure risk after a certain number of cycles.

Hypothetical outcome plots can easily make the abstract concept of risk much more concrete. Here is an animation that shows possible degradation paths a battery may experience after 100 cycles. Each degradation path is calculated based on a specific parameter sample drawn from the parameter posterior distribution.

As demonstrated in the animation, out of all the simulated paths, only a few of them that drop below the failure threshold after 100 cycles. Therefore, we can conclude that the associated failure risk is rather low.

Here is another animation with possible degradation paths after 110 cycles.

It is obvious that the failure risk is much higher now since there are more simulated paths that drop below the failure threshold after 110 cycles. By using simulated hypothetical outcome plots, the failure risk can be clearly highlighted and audiences can conveniently compare the risk magnitude between different cases.

4. Takeaway

  • Hypothetical outcome plots is a great tool to communicate uncertainties to general audiences and analysts alike.
  • It works by creating an animation, where each frame simulates one possible scenario drawn from the outcome distribution.
  • Hypothetical outcome plots make uncertainty visualization intuitive, concrete, and fun.

One thing to keep in mind when creating your own hypothetical outcome plots: we need to make sure that the simulations we show represent the true outcome distribution. This issue is less obvious when we can animate a large number of samples, but more prominent when the animated sample number is small, which is often constrained by the animation time length. You don’t want to run into the risk of sampling bias.

For the Python code that generates those animations, please check out the companion Jupyter Notebooks I’ve created.

Reference

[1] UW Interactive Data Lab, Hypothetical Outcome Plots: Experiencing the Uncertain, Medium, 2016. [2] Claus Wilke, Fundamentals of Data Visualization, O’Reilly, 2019.

About the Author

I’m a Ph.D. researcher working on uncertainty quantification and reliability analysis for aerospace applications. Statistics and data science form the core of my daily work. I love sharing what I’ve learned in the fascinating world of statistics. Check my previous posts to find out more and connect with me on Medium and Linkedin.


Related Articles