The world’s leading publication for data science, AI, and ML professionals.

What Does random.seed Do in NumPy

Understanding how to create reproducible results when generating pseudo-random constructs with NumPy in Python

Photo by Andrew Seaman on Unsplash
Photo by Andrew Seaman on Unsplash

Introduction

Randomness is a fundamental mathematical concept that is usually used in the context of Programming as well. Sometimes, we may need to introduce some randomness when creating some toy data or when we need to perform some specific calculations that will be dependent on some random event.

In today’s article, we are first going to discuss about the concepts of pseudo random numbers and true randomness. Additionally, we will discuss about numpy’s random.seed and how to use it in order to create reproducible results. Finally, we will showcase how to ensure that the same seed is sustained throughout the code.


Understanding Pseudorandomness

A sequence of pseudo random numbers is one that has been generated using a deterministic process but appears to be statistically random.

Even though the properties of sequences generated by pseudo random generators (also known as Deterministic Random Bit Generators) approximate the properties of random number sequences, in reality they are not truly random. This is because the generated sequence is determined by an initial value which is known as the seed.

You can view seed as the actual starting point of the sequence. Pseudo-random number generators generate each number based on some processes and operations performed on the previously generated value. Since the first value to be generated has no previous value on which the generated can perform these operations, the seed acts as the "previous value" for the first number to be generated.


Creating reproducible results using random.seed

In the same way, NumPy’s random number routines generate sequences of pseudo random numbers. This is achieved by creating a sequence with the use of Bit[Generators](https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator) (objects that generate random numbers) and Generators that make use of the created sequences to sample from different probability distributions such as Normal, Uniform or Binomial.

Now in order to generate reproducible sequences of pseudo random numbers, the BitGenerator object accepts a seed that is used to set the initial state. This can be achieved by setting [numpy.random.seed](https://numpy.org/doc/stable/reference/random/generated/numpy.random.seed.html) as shown below:

import numpy as np
np.random.seed(123)

Creating reproducible results is a common requirement in different use cases. For instance, when testing some piece of functionality, you may need to create reproducible results by configuring the seed to a specific value so that the generated results can be compared against the expected results.

Additionally, the creation of reproducible results is common in the wider field of research. For instance, if you work with a model that uses randomness (a random forest for example) and want to publish the results (say in a paper) then you may want (and probably have) to ensure that other people and users can reproduce the results you present.


Local Effect of seed

It is also important to mention that the random seed in NumPy also affects other methods, such as [numpy.random.permutation](https://numpy.org/doc/stable/reference/random/generated/numpy.random.permutation.html) and it also has a local effect.

This means that if you specify numpy.random.seed only once but call numpy.random.permutation multiple times, the results that you’ll get won’t be identical (since they won’t depend on the same seed). To showcase the problem, let’s consider the following code:

import numpy as np
np.random.seed(123)
print(np.random.permutation(10))
array([4, 0, 7, 5, 8, 3, 1, 6, 9, 2])
print(np.random.permutation(10))
array([3, 5, 4, 2, 8, 7, 6, 9, 0, 1])

As you can see, the results are not reproducible even though we have set the random seed. This happens because random.seed has only ‘local effect’. In order to reproduce the results, you’d have to specify the same random seed just before every call of np.random.permutation as shown below.

import numpy as np
np.random.seed(123)
print(np.random.permutation(10))
array([4, 0, 7, 5, 8, 3, 1, 6, 9, 2])
np.random.seed(123)
print(np.random.permutation(10))
array([4, 0, 7, 5, 8, 3, 1, 6, 9, 2])

Final Thoughts

In today’s article we discussed about the concepts of true or pseudo randomness and purpose of random.seed in NumPy and Python. Additionally, we showcased how to create reproducible results every time we execute the same piece of code, even when the results are dependent on some (pseudo)randomness. Finally, we explored how to ensure that the effect of random seed will be sustained throughout the code when this is required.


Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read.


You may also like

Speeding Up the Conversion Between PySpark and Pandas DataFrames


What’s the Difference Between Static and Class Methods in Python?


11 Python One-Liners for Everyday Programming


Related Articles