The world’s leading publication for data science, AI, and ML professionals.

How to generate random variables from scratch (no library used)

We go through a simple pseudo-random generator algorithm and show how to use that for generating important random variables

Image source: Pixabay (Free for commercial use)
Image source: Pixabay (Free for commercial use)

Introduction

Random variables are an integral part of Data Science, machine learning, and statistical modeling. They play an increasingly important role in the systems and services around us that employ artificial intelligence or deep neural networks.

The concept and properties of random variables are used in

  • regression techniques,
  • ensemble methods like random forests, gradient boosting,
  • deep learning,
  • clustering algorithms,
  • natural language processing,
  • reinforcement learning,
  • advanced search algorithms in AI and game theory.

This article is not about the properties and utility of random variables. Readers are encouraged to refer to other excellent resources for that purpose.

Understanding Random Variables

Probability and Statistics explained in the context of deep learning

This article aims to show how you can generate random variables from scratch with simple programming.

Note that phrase "from scratch". It means we will not call a third-party library or subroutine. Instead, we will write our own simple functions to generate such random variables.

Can you do the following?

  • Addition, multiplication, division, and modulus
  • Sine, cosine, and logarithm operations

We will show how to use these rudimentary mathematical operations to generate,

  • Uniform random variates
  • Random variates from the Exponential Distribution
  • Random variates from the Normal Distribution
Image source: Author-generated meme in imgflip.com
Image source: Author-generated meme in imgflip.com

This exercise will give you an intuitive understanding of the quality of the pseudo-random numbers that we have to work with and also show how common probability distributions evolve from one another.

Wait, did I say "Pseudo-random"? Let’s see what I meant by that.

In software, you gotta embrace the ‘Pseudo’

Mother nature offers numerous instances of true random processes. Nuclear decay is one such instance.

Image source: Wikimedia (Creative Commons License)
Image source: Wikimedia (Creative Commons License)

It turns out that true random processes can only be emulated and modeled with the so-called hardware random generators, a device that generates random numbers from a physical process, rather than by means of an algorithm. Such devices are often based on quantum-mechanical phenomena that generate low-level, stochastic "noise" signals, such as thermal noise, the photoelectric effect, interference of optical beams, and other quantum phenomena.

Unfortunately, in everyday computing systems, and in an algorithmic setting, we only encounter the so-called ‘pseudo-random’ numbers. They are not truly random as a natural process can be, and at some level, they are deterministic. A seed number is supplied to an algorithm, which generates a series of such numbers.

RANDOM.ORG – Introduction to Randomness and Random Numbers

A pretty famous person had this to say about such an approach!

Image source: Author-generated meme in imgflip.com with Wikipedia image
Image source: Author-generated meme in imgflip.com with Wikipedia image

However, for all practical computing purposes, these ‘pseudo-random’ numbers are sufficient if they do not exhibit an obvious repeating pattern or predictability.

Complex and elaborate pseudo-random generators are available in all major programming languages. In many cases, they use special types of mathematics such as Modular Arithmetic or Marsenne Twister. Here is the link to the random module of Python, for example.

There is also a wholly separate branch of research and development around cryptographically secured pseudo random generators for encryption systems. Imagine what can happen if someone can guess or tease out the exact algorithm for generating the hash keys for your Google password?

Why secure systems require random numbers

But we are not concerned with such complex matters!

In this article, we will write a very simple program with the help of rudimentary mathematical operations, to build our random generator. In fact, we will use a method called a linear congruential generator, which was popular in the old days of pseudo-random generation.

The output of our program will take the form of a simple Uniform Distribution. Thereafter, we will use statistical theorems and transformations to generate random variables, corresponding to other distributions, based on this random generator.

Uniform random generator

Linear congruential generator (the ‘bad’ version)

We will use a linear recurrence relation as follows,

And, the uniform random variates are obtained after scaling,

We will also need a seed number for starting the generation process. So, here is a simple Python function for achieving this task,

So far, all we have done are some multiplication, addition, modulus, and division. Pretty simple arithmetic, right?

Let’s see how this simple function performs. Note, we named is pseudo_uniform_bad because we know the choice of the multiplier and modulo numbers are pretty bad.

Let me show what I mean. Let us generate 1000 random numbers with the seed=3 and plot their histogram to see the distribution.

Not good, eh? We expected to see a Uniform distribution between 0 and 1 but got some patches of numbers instead.

Turns out, we can easily improve it by changing the multiplier and the modulo integers.

Linear congruential generator (the ‘better’ version)

Let’s amp up the magnitude of those integers.

Now, if we generate 10,000 random numbers and plot their histogram, it looks like following,

A generalized uniform random generator

Now, it is very easy to construct a generalized uniform random generator function,

Here are 10,000 uniformly random distributed numbers between -5 and +7.

Other randomizers and distributions

Now that we could program a decent uniform random generator, we can easily extend it to construct random variate generators for a multitude of statistical distributions.

A sample picker

For example, using the uniform generator, we can quickly make a sample picker, which will pick a random sample from a given list of items (does not need to be number, any type of object will do).

Note that we also use the decimal portion of the floating-point value returned by the current system clock as the seed for this function.

We can test it with a simulation of a fair dice throw whose six sides are passed as the list to the function,

Bernoulli distribution

This is a discrete probability distribution that can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes-no question. The single parameter is denoted p – the probability of success.

Our code for Bernoulli generator is deceptively simple,

Binomial distribution

It is the discrete probability distribution of a specific number of successes in a sequence of n independent experiments, each asking a yes-no question, and each with its own boolean-valued outcome: success with probability p or failure with probability q = 1 − p.

We can code a Binomial random variate generator quite easily from the Uniform generator,

Let’s suppose 100 loaded coins, each with the probability of head 0.75, are flipped, and this trial/experiment is repeated for 15 times. The number of heads in each experiment is given by our function,

Poisson distribution

It is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.

We can code a Poisson generator using our Uniform generator as follows,

A simulation result is shown below for three different rate parameters where 10,000 random variates were generated using the function above and the count of the events are plotted,

Exponential distribution

It is the continuous probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate.

For generating an Exponential distribution from a Uniform distribution, we can use a powerful technique called "Inverse Transform Method". We don’t need to dwell on the theoretical details here, but ultimately this method links the two types of distributions through a simple mathematical transformation.

Here is a histogram of 10,000 random variates with the rate parameter of 0.1,

The Normal distribution

Normal or Gaussian distribution is, without any doubt, the most famous statistical distribution (primarily because of its link with the Central Limit Theorem).

It turns out that using a special method called Box-Muller transform, we can generate Normally distributed random variates from Uniform random generators.

Here is a simulation result with three different sets of parameters,

Summary

In this article, we showed how to start with a simple linear congruential generator to generate uniformly distributed random numbers, with simple programming. Thereafter, we showed how to use rudimentary mathematical operations and transformations to build a more complex set of statistical distributions from this base generator.

We were able to build functions for multiple widely-used probability distributions without using any standard library like the np.random module of the NumPy package.

If you have reached this far, here is a fun video from Cloudflare, which uses a series of lava lamps in their San Francisco office, to generate randomness for their cryptographic systems.


Also, you can check the author’s GitHub repositories for code, ideas, and resources in Machine Learning and data science. If you are, like me, passionate about AI/machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.

Tirthajyoti Sarkar – Sr. Principal Engineer – Semiconductor, AI, Machine Learning – ON…


Related Articles