The world’s leading publication for data science, AI, and ML professionals.

Speed Up Your Simulations By Deploying These Sampling Strategies

Six strategies to sample random variables more efficiently. Learn about antithetic variables, importance sampling, moment matching, control…

Photo by René Porter on Unsplash
Photo by René Porter on Unsplash

Although many real-world systems (and all their uncertainties) can be formulated as a mathematical problems, we often lack the ability to acquire analytical solutions. Even a halfway-realistic representation of reality contains many sources of uncertainty, not necessarily from well-known theoretical distributions.

Often, we resort to Monte Carlo simulation, drawing large numbers of samples from random distributions and relying on the Central Limit Theory to estimate the true values of the system. However, mathematically the approximation only holds for infinitely many samples. This article discusses six sampling strategies for those who don’t have infinite time on their hands.

The drawbacks of Monte Carlo simulation

Photo by Nic Rosenau on Unsplash
Photo by Nic Rosenau on Unsplash

A simulation can be a static representation of a system (e.g., factory layout), but often also embeds aspects of learning and optimization (e.g., Reinforcement Learning). Monte Carlo simulation provides a very rich toolbox, but its main drawback is its inefficiency.

Suppose you want to determine the average number of eyes thrown with a die (a very basic simulation model). If you repeat the experiment 100 times, you’ll likely end somewhere between 3.2 and 3.8 – the error compared to the true mean of 3.5 is quite large. 1000 repetitions probably yields values between 3.4 and 3.6 – better, but not overly accurate either.

If thousands of repetitions are needed to approximate the value of a simple die, you can imagine millions or even billions of repetitions are needed to accurately estimate values associated to stochastic systems subject to multiple random distributions. Factor in that running the environment itself also requires computational effort (consider modelling a production floor or a wind park) and the computational problems become apparent.

Fortunately, we can often do better than sampling at random, vastly reducing the computational burden to achieve a certain level of precision. This article treats a number of sampling strategies, which can be used in settings such as Monte Carlo simulation and Reinforcement Learning.

As a running example, I will use simulation to price a European call option. For those unfamiliar with financial derivatives: at a specific date in the future (e.g., one year from now), such an option gives the right to buy a stock against a pre-determined strike price K. If the stock price S exceeds the strike price at the exercise date (S>K), it makes sense to exercise the option and lock in the profit. If S<K, it is not rational to exercise the option. Thus, the payoff can be denoted as max(S-K,0). The (discounted) option value equals the expected payoff, which we try to learn using Monte Carlo simulation. For this particular option, we can validate the price using an analytical solution (the Black-Scholes equation), yielding the true value of the option.

Antithetic variables

"If you go low, I go high."

In many simulation environments, extremes on both sides of the spectrum matter. A solar park will generate much energy on a sunny day and little on a day covered in thick clouds. We earn a lot of money when the stock market surges and lose a lot when it plunges. Naturally, if by chance we happen to sample mostly sunny days and booming markets, our picture of reality will be skewed.

The idea behind Antithetic Sampling is to draw random variables in pairs, in which one draw is the ‘opposite’ of the other draw. Consider drawing from a uniform [0,10] distribution. If we randomly sample 1.3, we’d also sample 8.7 to complete the antithetic pair. If we draw the 64th percentile point of a normal distribution, we also draw the 36th.

A bit more formal: for every generated path _[ϵ_1,ϵ_2,…,ϵM], we generate an antithetic part _[-ϵ_1,-ϵ_2,…,-ϵM]. In addition to variance reduction benefits, generating ϵ may be computationally expensive – the antithetic counterpart then provides a `bonus’ observation.

Example time. Stock price paths are often generated based on return samples drawn from standard normal deviations (the Geometric Brownian Motion model). If we do so using antithetic sampling, the generated pairs of price paths look like mirror images:

Sampled price paths. The blue path represents the 'original' path. The orange path is the antithetic path, forming a mirror image of the original one. [image by author]
Sampled price paths. The blue path represents the ‘original’ path. The orange path is the antithetic path, forming a mirror image of the original one. [image by author]

Typically, sampling in this way reduces the standard error. Of course, to compare fairly between standard MC and antithetic MC, we need the same number of data points M: yielding standard errors _σstrd/sqrt(2M) and _σanti/sqrt(M).

Let’s try it on our option example:

Comparison of pricing using both regular MC and antithetic MC. The latter consistently yields smaller standard errors [image by author]
Comparison of pricing using both regular MC and antithetic MC. The latter consistently yields smaller standard errors [image by author]

Indeed: smaller standard error, more accurate outcome!

Control variates

In many cases, we have some solution to which we can contrast the outcome of our simulation. We might have an analytical solution for a comparable problem, or even just a sampled observation. Such knowledge is valuable, as it can be leveraged to reduce the variance of our samples.

A control variate is any term that we introduce to the estimator that has zero mean. They can be safely added since they do not affect the expectation; however, they do affect the variance.

That may sound a bit abstract, so let’s quickly move to an example. As mentioned earlier, the Black-Scholes formula is an analytical expression to value European options. For American options – which can be exercised at any point till maturity – no analytical solution exists. However, the payoff profiles are pretty similar; we’d expect the American option value _vA to be at least equal to that of a European option _vE (which effectively is a limited case of a European option).

We can value the American option using simulation (_vA^MC), yet simultaneously use the samples to value a European option with the same strike price and maturity. For the latter, we can than measure the error between the simulated value _vE^MC and the analytical value _vE. We then assume the same error holds for _vA^MC and true (unknown) value _vA.

Time to plug in some numbers.

Suppose our Monte Carlo simulation for the American option yields _vA^MC=4.49. The European Monte Carlo estimate (using the same simulated price paths) is v𝐸^𝑀C=4.32,_ and the Black-Scholes formula yields v_E=4.08. Thus, the European option value has an error v__E −v_E^MC=−_0.24.

We assume this error also holds for American option. Then, we can estimate _v_A=v_A^MC+v_E-vE^MC=4.25. Note that we have corrected our initial estimate of 4.49. As the simulation overestimated the value for the European option, it is likely that we also overestimate the American option – the revised estimate should be more accurate.

Importance sampling

When sampling, we may be interested only in a subset of observations – rare events that require interventions, extreme outcomes, exceptions that ‘almost never’ occur… The problem is that these rare events are – you wouldn’t believe it – rare. If an event only occurs 1 in 10,000 times, you’d have to generate many random samples just to get a few observations.

As an example, consider an option that is deep out-of-the-money. Only if the stock price explodes, it will be sufficiently high for the option to pay off. Suppose such price paths only occur with a probability of 0.1%. With random sampling, on average we will only receive a non-zero payoff once every 1000 replications. We might need millions of paths to get a somewhat reasonable estimate, with 99.9% of simulated paths yielding no additional insights.

Importance Sampling addresses the issue by sampling only from a conditional probability distribution. With simulated returns following a normal distribution, we could only sample from the right-hand tail, knowing from analytical knowledge that we only cover 0.1% of the full distribution. Obviously, we’d have to correct for the many zero-valued outcomes we ignore (it is straightforward to adjust the average payoff).

Visual representation of importance sampling. We only sample from the shaded region, using our knowledge of the Gaussian distribution [image by author]
Visual representation of importance sampling. We only sample from the shaded region, using our knowledge of the Gaussian distribution [image by author]

Check out the example below. With regular sampling, we don’t generate any in-the-money paths, thus concluding the option value is 0. Importance sampling, on the other hand, generates many positive payoffs. After converting to the non-conditional distribution, we see that the price is fairly close to the true value – with random samples, this would take many, many more replications!

Regular sampling vs. importance sampling to value a deep in-the-money option. As we only sample price paths with very strong growth, importance sampling yields an accurate price estimate with only few observations [image by author]
Regular sampling vs. importance sampling to value a deep in-the-money option. As we only sample price paths with very strong growth, importance sampling yields an accurate price estimate with only few observations [image by author]

Moment matching

In Monte Carlo simulation, we often sample from known theoretical distributions, e.g., a normal distribution. A distribution can be characterized by its moments (e.g., mean, standard deviation, skewness, kurtosis), which are quantitative properties of the corresponding graph.

You might expect that if you draw samples from a normal distribution, the constructed set of samples is normally distributed as well. This intuition holds for large numbers of samples, but unfortunately not for smaller ones.

Consider the following example, in which we draw ten samples from a standard normal distribution. By extension, we’d hope that our sample distribution also has a mean of 0 and a standard deviation of 1. However, a quick calculation shows this is not the case.

Set of samples from a standard normal distribution. Although we draw from an N(0,1) distributions, the sampled set exhibits a different mean and standard deviation due to the small sample size [image by author]
Set of samples from a standard normal distribution. Although we draw from an N(0,1) distributions, the sampled set exhibits a different mean and standard deviation due to the small sample size [image by author]

Fortunately, the fix is quite easy. In this case, we simply subtract the sample mean (resetting the mean to 0) and divide by the sample standard deviation (resetting it to 1). Note we could do the same for skewness or kurtosis.

Moment matching. We transform the original samples based on the desired distribution properties. After the transformation, the sample set reflects a proper standard normal distribution [image by author]
Moment matching. We transform the original samples based on the desired distribution properties. After the transformation, the sample set reflects a proper standard normal distribution [image by author]

The transformed sample reflects the true distribution we wish to sample from, which speeds up convergence.

Stratified sampling

Sampling sufficiently often from a random distribution is reassuring in terms of convergence. However, in practice sampling is very much finite, begging the question why it needs to be ‘random’ at all. Instead, given a finite number of samples, why not try to cover the distribution as well as possible?

Stratified sampling explores this notion. Suppose we draw ten samples from a uniform distribution between 1 and 10. A random sample might look like this:

Random sampling: [7 10 5 6 9 8 7 2 2 5]

Note that 1,3 and 4 are not represented, whereas 2, 5 and 7 are sampled twice. Let’s consider stratified sampling next:

Stratified sampling: [1 2 3 4 5 6 7 8 9 10]

We simply distribute our samples in a way that it proportional to the probability mass of the distribution, providing an even coverage of the distribution. Note that (i) we need to know the distribution and (ii) we must specify the number of samples up front to properly draw them.

The procedure is easiest to visualize with a uniform distribution. Below are examples in one-, two- and three dimensions, respectively.

Random sampling (left) vs stratisfies sampling (right). Note that with the same number of observations, stratified sampling provides a much more even coverage of the sampling space. [image by author]
Random sampling (left) vs stratisfies sampling (right). Note that with the same number of observations, stratified sampling provides a much more even coverage of the sampling space. [image by author]

For the normal distribution, we’d sample more points from the center of the Bell curve (high probability mass) and only sparsely from the tails. In other words, the density of samples is proportional to the distribution.

The numerical example below shows that stratified sampling is much more accurate for limited samples. In this case, we price an option with just 10 replications (an extremely low number). Yet, the result is fairly accurate. In contrast, random sampling is far from the true value.

Value estimate for an option, using only 10 samples for both regular Monte Carlo and stratified sampling. Despite the extremely low number of samples, stratified sampling gets quite close to the true price, whereas regular MC is completely off. [image by author]
Value estimate for an option, using only 10 samples for both regular Monte Carlo and stratified sampling. Despite the extremely low number of samples, stratified sampling gets quite close to the true price, whereas regular MC is completely off. [image by author]

Quasi-random sampling

Quasi-random sampling (also low-discrepancy sampling) is very similar to stratified sampling, in the sense that it aims to evenly cover the theoretical distribution. The main distinction is that stratified sampling relies on a predetermined number of observations, whereas quasi-random sampling is sequential.

If we could make one more observation, which point would yield the most insight?

In many cases, you are not interested in running a given number of iterations, but want to run sufficient iterations to meet a certain standard error. You might want to attain a standard error <0.1%, without knowing whether this requires 10,000 or 1,000,000 replications. In such cases, quasi-random sampling can expand the sequence until hitting the desired accuracy.

For instance, suppose we want to generate samples between 0 and 1. We could start in the middle, sampling 0.5. The ‘gaps’ on the left and right are equally large, so the next sample could be 0.25, and then 0.75. Afterwards, we might sample between 0.25 and 0.5, between 0.5 and 0.75, etc.

There are various ways to fill the ‘gaps’ in the distribution, each with corresponding algorithms. For instance, a Sobol sequence can be generated as follows using scipy:

from scipy.stats import qmc
>>> sampler = qmc.Sobol(d=2, scramble=False)
>>> sample = sampler.random_base2(m=3)
>>> sample
array([[0.   , 0.   ],
       [0.5  , 0.5  ],
       [0.75 , 0.25 ],
       [0.25 , 0.75 ],
       [0.375, 0.375],
       [0.875, 0.875],
       [0.625, 0.125],
       [0.125, 0.625]])
scipy documentation: https://scipy.github.io/devdocs/reference/generated/scipy.stats.qmc.Sobol.html
Example of 2D Sobol sequences, containing the first 10 (red), 100 (red+blue) and 256 (red+blue+green) elements of the sequence [Image from WikiMedia by Jheald]
Example of 2D Sobol sequences, containing the first 10 (red), 100 (red+blue) and 256 (red+blue+green) elements of the sequence [Image from WikiMedia by Jheald]

Although not having a predefined number of samples, the impact on variance is very similar to that of stratified sampling. The intuition remains to get a representative set of samples for the distribution at hand.

Closing words

Random sampling is unbiased and theoretically guarantees convergence under infinite repetition, but is often highly inefficient. We may exhaustively explore problem regions that do not add new insights, while ignoring the regions of interest.

Sampling Strategies leverage our knowledge of the problem structure, transform samples, or simply try to cover the underlying distribution in a smarter way.

The described techniques may not always work. Most critically, we often need an understanding of the probability distribution; in reality, we frequently rely on empirical distributions.

Limitations aside: sampling strategies often empirically prove to substantially speed up simulations. When you are strapped for time and your algorithm just doesn’t seem to converge, be sure to give it a try!


References

Hull, J. C. (2014). Options, Futures, and Other Derivatives. Global Edition.

Antithetic variates – Wikipedia

Control variates – Wikipedia

Importance sampling – Wikipedia

Low-discrepancy sequence – Wikipedia

Method of moments (statistics) – Wikipedia

Stratified sampling – Wikipedia


Related Articles