Prediction is very difficult, especially if it’s about the future. – Niels Bohr
As a data scientist working in the Engineering field, I frequently collaborate with other research engineers on the physics side, with the task of developing quantitative models to predict physical phenomena of interest. However, getting a reliable prediction is never easy, mostly it’s because I can rarely be sure about the correct values of the model hyperparameters. Those hyperparameters, which convey specific physical interpretations, are usually calibrated via noisy experiments. As a result, there is no way to be sure of their exact values due to the randomness inherent in the calibration process. Simply put, those hyperparameters are uncertain.
I am usually left with two options: one, I can just pick the nominal values (or most probable values) of the hyperparameters and make predictions based on them, hoping that the obtained results are accurate or at least close to the truth; second, I can accept the fact that those hyperparameters are uncertain, and use proper uncertainty propagation techniques to propagate the uncertainties from model hyperparameters to the prediction error/uncertainty of the model.
In practice, the second option is usually preferred. In that scenario, the prediction output will not just be a single value. Instead, we will get a full probability distribution of it. This is advantageous since the outcome of this option is more informative, as it offers us a measure of confidence associated with our prediction, thus achieving our goal of quantifying the prediction error.

In this article, we will introduce a useful technique to realize uncertainty propagation and quantification of prediction error. The remaining of the article is structured as follows:
- Motivation: what is forward uncertainty propagation?
- Solution: sampling-based Monte Carlo method
- Illustration: a simple case study
- Summary: challenges and responses
So let’s get it started!
1. Forward uncertainty propagation
As described previously, a modeler often needs to battle the issue of uncertain hyperparameters, which can easily compromise the reliability of the model prediction. Under this background, forward uncertainty propagation strategies are proposed to fully consider the input parameter uncertainties and quantify the induced prediction uncertainty.
Formally, the primary goal of the forward uncertainty quantification is to assess the variation of the output under the influence of various input uncertainties.
Forward here means that uncertainty information is flowing from the inputs, through the physical model under investigation, towards the output. Depending on the analyst’s goal, the focus of forward uncertainty propagation could be on simply estimating the basic Statistics (e.g., mean, variance, etc.) of the output, or the probability of the output value exceeding a certain threshold (i.e. risk analysis), or the entire probability density function (PDF) of the output.
Forward uncertainty propagation belongs to a larger study field of uncertainty management in computational science and engineering, where propagation constitutes a major step towards the management goal. Readers can find more details of uncertainty management in my previously published blog: Managing uncertainty in computational science and engineering.
In practice, several methods are designed to deliver the desired forward Uncertainty propagation. Those methods include the probabilistic analytical approach, perturbation approach, spectral approach, etc. Generally, those approaches involve heavy mathematical derivations and their implementations are far from straightforward. In addition to that, they may only be valid for specific problems and may only be accurate when the uncertainty levels of the hyperparameters are small. Consequently, those approaches are not yet widely adopted in industrial practices and will not be discussed in this article. Instead, we will be focusing on a much more popular approach: Monte Carlo simulation.
2. Monte Carlo simulation
Intuitively understandable, easy to implement, no extra modification to the model (or non-intrusive, if you prefer the technical jargon), highly parallelizable, make minimal assumptions about the model and its associated hyperparameters. Sounds like the approach that every practitioner would love, right? Indeed, it is exactly those features that make Monte Carlo simulation the most popular approach in realizing uncertainty propagation.
Technically speaking, Monte Carlo simulation involves drawing random samples from input probability distributions and calculating the corresponding response of each sample using one’s model. The statistics of the output can then be inferred based on the obtained ensemble of results.
The workflow of the Monte Carlo simulation can be summarized as the following:
- Draw random samples from the distribution of the model hyperparameters. Here, each sample corresponds to a specific combination of the model hyperparameters.
- For each combination of the model hyperparameters, insert them into the model, and use this newly configured model to make predictions. At the end of this step, we will obtain an ensemble of predictions.
- Based on this ensemble of predictions, we can easily estimate various statistics of interests, from simple ones like mean and variance to the full probability density function.
As you may have noticed, to kick off a Monte Carlo simulation, we need to know the probability distribution of the uncertain model hyperparameters. This is important as how the inputs are statistically modeled will directly affect the output uncertainty. In practice, Bayesian statistics are usually employed to derive the target probability distribution from existing data (e.g., experimental and computational). In other situations, those probability distributions may be simply assigned by domain experts based on their experience and judgment. In that case, epistemic uncertainty may be introduced in the statistical modeling of the uncertain hyperparameters.
3. Case study: Cannon shooting
Now, let’s see how Monte Carlo simulation is employed in practice. Here we consider a projectile motion problem and we are interested in calculating the shooting range of a cannon. In real life, this analysis could be used to derive the probability of successfully destroying an enemy target.
Our physical model is a simple one, where the shooting range R is determined by the initial velocity _v_₀, shooting angle θ, as well we the gravity acceleration g, __ as shown in Fig. 2 (if you are interested in how to derive this equation, here is the link). It’s a simple model since the effect of air resistance is not considered. Nevertheless, it is good enough for our demonstration purpose.

Now, suppose that we are not so sure about the values of _v_₀ and θ, but we do know that g=9.8m/s². Therefore, in our current case study, _v_₀ and θ are treated as the uncertain model hyperparameters. To describe their uncertainty, we choose a normal distribution for _v_₀ and a uniform distribution for θ:

For the initial velocity _v_₀, its mean value is 150m/s, with a standard deviation value of 5m/s. For the shooting angle θ, its uncertainty range is from 40ᵒ to 50ᵒ. Due to the uncertainty embedded in the model hyperparameters (_v_₀ and θ), our prediction of shooting range R will also be uncertain. Now our goal is to quantify exactly how uncertain our R prediction will be and derive its full probability distribution.
Time to apply the Monte Carlo procedure. Following the steps outlined in the previous section, we firstly draw 10000 random samples of _v_₀ and θ from their respective distributions (Fig. 3). For that, the random number generator from Numpy can be utilized. Subsequently, for each combination of _v_₀ and θ, we calculate the corresponding R value.
Figure 4(a) depicts several calculated trajectories based on some selected samples of _v_₀ and θ. We can see that those trajectories are quite different when initial shooting conditions (i.e., _v_₀ and θ) are varying. This indicates that the impact of the uncertainties in _v_₀ and θ is non-negligible.

The corresponding histogram of R is shown in Fig. 4(a). Here, we can see that the distribution of the shooting range R exhibits a bell shape, which centers around 2270m. This is also called the most probable value of R. On the other hand, the red dashed line, which has an R value of 2296m, is calculated by using the nominal values of v₀ and θ, i.e., 150m/s and 45ᵒ. Notice that those two R values are not the same. This is common when the underlying model is nonlinear (as in our current case). This observation also informs us that simply plugging in the most probable valu_e_s of the inputs would in general not yield the most probable value of the output. This signifies the importance of performing a rigorous uncertainty propagation analysis.
4. Challenges and solutions
As every coin has two sides, the Monte Carlo approach also has its own shortcoming. In the section, we will discuss the major challenge and possible remedies when implementing the Monte Carlo technique in practice.
The main criticism of the Monte Carlo method lies in its slow convergence rate. This means that we need a large number of samples (∼ o(10⁴)) to ensure the reliability of the Monte Carlo results. Consequently, we need to repeatedly calculate the model prediction for each sample, thus potentially inducing prohibitive computational cost. In the following, we introduce two popular ways to combat this convergence issue.
4.1 Smart sampling scheme
The first thing we could do is to improve the sampling scheme when using the Monte Carlo method. Naive Monte Carlo approach utilizes random sampling to do the job. That sampling scheme is not particularly good as it creates obvious "clusters" and "holes" in the parameter space (we will illustrate this later). As a result, we would need a lot of samples to cover the whole parameter space.
Instead, researchers have proposed more advanced sampling schemes with better "space-filling" properties, meaning that those schemes can generate samples more evenly across the parameter space. Examples in this category include Latin hypercube sampling, as well as Sobol sequences and Halton sequences, which are all low-discrepancy sequences.
Figure 5 compares various sampling schemes applied to a 2D parameter space, with the same number of samples. Here we can see that a random sampling scheme tends to form clumps and leave voids, while other advanced sampling schemes are capable of generating more "space-filling" samples.

4.2 Surrogate modeling techniques
Another way to circumvent the issue of high computational cost is by adopting the surrogate modeling techniques. The core idea behind this is to train a cheap statistical model to approximate (or "surrogate") the original physical model, e.g., the projectile motion model in Fig. 2. Afterward, the Monte Carlo approach can be directly applied to the trained statistical model. Since one evaluation of the statistical model involves negligible computational cost, the overall expense of the Monte Carlo procedure becomes affordable.
At the core of surrogate modeling is the supervised machine learning, as the goal is to train a surrogate model based on input data and the corresponding labeled output data. Many machine learning techniques, such as support vector machine, Gaussian process, neural network, etc., have already been employed in accelerating Monte Carlo simulations. As more and more powerful machine learning techniques are constantly being developed, we can expect that the accuracy and efficiency of Monte Carlo simulations will be further improved.
5. Key take-aways
- Forward uncertainty propagation is essential to estimate the model prediction error/uncertainty induced by the uncertain model hyperparameters.
- Monte Carlo simulation is one of the most popular approaches in achieving uncertainty propagation.
- Monte Carlo simulation could be computationally expensive, as many samples may be required to ensure accuracy.
- Smarter sampling schemes and surrogate modeling techniques can help alleviate the high computational cost associated with Monte Carlo simulations
Further reading:
[1] Art B. Owen, Monte Carlo theory, methods, and examples, 2013
About the Author
I’m a Ph.D. researcher working on uncertainty quantification and reliability analysis for aerospace applications. Statistics and data science form the core of my daily work. I love sharing what I’ve learned in the fascinating world of statistics. Check my previous posts to find out more and connect with me on Medium and Linkedin.