The world’s leading publication for data science, AI, and ML professionals.

Deriving the inverse transform sampling method from first principles

Insights on generating samples from a custom probability density function

Quick note:

In this article, I provide insights on generating samples from a custom probability density function using the inverse transform sampling method.

Almost every programming language enables its users to generate random numbers. But what about generating numbers that follow a certain probability Distribution? This question was the main trigger of my determination to read about this topic.

When dealing with abstract concepts, namely math and programming, often the challenge is to picture those concepts in mind. In light of that, and throughout this article, I will be constantly illustrating my ideas using either diagrams or animations.

In the end, what I will try to achieve is to build – from scratch – a tool that enables sampling from a custom Probability density function.


Probability density functions are easy to construct yet not trivial to sample from

In probability and statistics, precisely when describing continuous random variables, we often come across the concept of a probability density function. It is a powerful tool that characterizes a random variable as it gives the probabilities/frequencies of occurs of the values taken by the random variable.

As an illustration (see figure below), If we consider a random variable Y described by the density function f, then if we generate a total of 100 independent samples, ideally 24 of the samples would fall between 1–2, 40 between 2–3, 16 between 3–4, and so forth.

Illustration of a probability density function [illustration by author]
Illustration of a probability density function [illustration by author]

It happens that there is a huge family of probability density functions that range from the most communally used ones (uniform, normal, exponential, …) to the most unseen ones. Indeed, if you want to construct your own density function, just take an integrable function, normalize it by its integral, et Voilà!

Examples of distributions: in the 1st row 3 usual distributions whereas in the second row 3 unusual ones [illustration by author]
Examples of distributions: in the 1st row 3 usual distributions whereas in the second row 3 unusual ones [illustration by author]

Numerically speaking, generating samples from a uniform distribution is quite straightforward in the sense that it is equivalent to generating random numbers. And obviously, almost every programming language has a pseudorandom number generator such as random.random() for Python and Math.random() for JavaScript.

However, doing the same thing for a custom distribution function is not trivial and needs some consideration.

This frustration of being restricted to only sampling from a uniform distribution yet having the ease to construct density functions was the trigger of my determination.

I do admit that after making some research on the internet, I discovered pre-existing tools that enable doing that (Python: scipy.stats.rv_continuous). Still, it was not sufficient to satisfy my curiosity.

What I will try to achieve in this article is to uncover what lies under the hood by building from scratch – using uniform distribution – a tool that enables sampling from a custom density function.

Finding a link between a uniform and custom distribution

Suppose that I constructed a custom density function f. My objective now is to generate samples from it by leveraging a uniform distribution. The basic idea I thought about is to find a mapping between a uniform and the desired density function. In other words, I will generate samples from a uniform distribution and map those numbers with their corresponding peers.

The schema followed to generate samples from a custom density [illustration by author]
The schema followed to generate samples from a custom density [illustration by author]

With this in mind, what remains is to construct the mapping function between uniform and the custom density, said differently reveals what the above cogs are actually doing.

A hint to building this mapping can be derived from measure theory. In short, measure theory is a generalization of the concept of length, area, and volume. Except that in the context of probability theory – which is a special case – we are measuring probabilities instead. Density probably functions – more precisely "f dx" – can be perceived as measures and their integration has a direct link with probability.

Interpretation of the area under the curve of the density function as a probability [illustration by author]
Interpretation of the area under the curve of the density function as a probability [illustration by author]

In both cases uniform and custom density, we are measuring probabilities which are interpreted as the area under the curve. Therefore, to find the mapping, we only need to find a link between the areas under the curve of both uniform and custom density.

To do so, we firstly sample a number from a uniform. Then measure the area delimited by it. Finally, we find the corresponding number that delimits the same area under the curve of the custom distribution.

A detailed schema of the process followed to sample from a custom density [illustration by author]
A detailed schema of the process followed to sample from a custom density [illustration by author]

Putting that into an equation, we need to solve for every sampled number x from the uniform distribution the following equation to find its corresponding peer in the custom density function

Where the F subscript X (respectively F subscript Y) denotes the area under the curve delimited by x (respectively y) of the density function. In literature, F is called cumulative distribution function. It measures the probability that the random variable will fall in the left-hand interval delimited by the specified bound which is exactly in our case the area under the curve delimited by the specified bound.

Implementation of the mapping process between a uniform and a custom distribution

Now, we have all the theoretical material to map a generated sample from a uniform to its peer sample in the custom density. What remains is to build a program to solve the above equation.

For the sake of simplicity, we will sample from a uniform distribution between 0–1. Therefore, the area under the curve for every sampled number will be exactly equal to the number itself.

On the other hand, computing the area under the curve of the custom density for a given y is straightforward as we have an explicit expression of the density function. In fact, we are the ones who inputted this expression. We compute this area by computing an integral. To achieve that, we can use Python: scipy.intergrate.quad. Therefore, the final look of the equation that must be solved is the following:

Solving this problem is equivalent to finding the roots of the above nonlinear equation. To do so, we might use again a solver. For instance Python: scipy.optimize.fsolve.

However, how do we ensure that this equation has a solution? And if so, how do we pick up the right one if it has too many?

The cumulative distribution function measures a probability. Hence, it is a quantity between 0–1. Since the sampled number from the uniform distribution is also a quantity that lies between 0–1, we are guaranteed that the equation has at least one solution.

On the other hand, the cumulative distribution function is a monotonically increasing function. So, we are sure that whenever there is a solution it must be unique.

Based on this reasoning, we can solve the equation using a solver without worries. The core idea behind nonlinear solvers is to build a sequence that converges to the desired solution. Usually, this sequence is defined through a recursion formula, in other terms, the value of the sequence in a given stage relies on the values computed in the previous ones. Therefore, the choice of the starting point of the solver is a critical element in the sense that chosen an inappropriate one might yield a wrong solution, and in the worst-case yield a sequence that does not converge at all.

To avoid such a problem, we need to find a way to choose conveniently the starting point. To do so, we can use again one of the properties of the cumulative distribution function, namely its monotony.

We can – at the moment we build the density function – create an array of tuples (see illustration below). Consequently, the elements of the array are sorted in increasing order. Based on that, suppose we sampled a number x from the uniform distribution. We can use a slightly modified version of binary search to find the two closest elements to it. The ‘y‘s that correspond to these two elements will define an interval where our solution is most likely to be. Hence, we can choose as a starting point the midpoint of this interval.

Illustration of the way of choosing a suitable starting point [illustration by author]
Illustration of the way of choosing a suitable starting point [illustration by author]

Now, we have all the ingredients to find the mapping of a given sampled number from a uniform.

Testing the mapping process on examples

After coding the previous ideas, I started testing the mapping process gradually (see figure below).

I have started with a special case where the custom density that I want to sample from is a uniform distribution. Based on that, the mapping process should ideally give values that are exactly the same as the ones generated from the uniform distribution. Indeed, in this case, I am using a uniform to generate samples from a uniform.

The second test I have opted for is to generate samples from a usual distribution, in my case I have chosen a normal distribution. Visually, I was expecting to see the same delimited area for both curves. For instance, whenever we get a number on the right side of the square (uniform), the mapped number must be also on the right side of the bell (normal).

The last test was to test the process on an unusual distribution. Frankly, I was genuinely creative in imaging the most unusual density function I have ever come across 🙂

I have gathered all the previous tests in animations where on the left side I put the uniform distribution from which I sample. Besides, I represented the sampled number using a red dot. On the right side, I put the densities that I am trying to sample from. Similarly, I represented the mapped number using a red dot.

Test of the mapping process on a uniform [illustration by author]
Test of the mapping process on a uniform [illustration by author]
Test of the mapping on a normal distribution [illustration by author]
Test of the mapping on a normal distribution [illustration by author]
Test of the mapping process on a custom distribution [illustration by author]
Test of the mapping process on a custom distribution [illustration by author]

After making sure that the mapping process gives accurate results, it was time to test the whole process in which the target is to generate samples from a given density function. So I considered a custom density function, I generated a large sample from a uniform, then I mapped those samples with their corresponding peers. Finally, I represented the result using a histogram. My expectation was that the bars of the histogram should get close to the curve of the density function as the number of samples grows.

Another time, I made an animation where I represent the evolution of the histogram bars length as I increase the number of samples.

Testing the whole process on a custom density function [illustration by author]
Testing the whole process on a custom density function [illustration by author]

Conclusion

Throughout this article, I described how we can leverage uniform distribution to generate samples from a custom density function. Precisely, I built a tool that enables doing that. Firstly, I explained the process of implementing it. lastly, I have tested it on several distributions ranging from the communally used ones to the most unfamiliar ones.

Referring to literature, the method described in this article is called Inverse transform sampling. There are other methods used to achieve the same goal. Check this article for further reading.


When writing this article, I deliberately avoided putting any piece of code though I was adopting a numerical perspective. In fact, I wanted the article to be accessible for both programmers and non-programmers. However, if you are curious about how things were actually implemented, you can check my GitHub repository. There, I uploaded details about the code.

Last but not least, I can say that I have successively reached my objective. So, to celebrate such an achievement and to satisfy my ego there was nothing better than imagining all sort of density functions and sampling from them

examples of distribution generated using the built tools [illustration by author]
examples of distribution generated using the built tools [illustration by author]

Rendezvous: "Insight on pseudonumber generators"


Related Articles