The world’s leading publication for data science, AI, and ML professionals.

Discrete random variables and PMFs explained using Python

Learn how to compute the Probability Mass Function (PMF) and use it to compute probabilities

Learn how to master the Probability Mass Function (PMF)

Photo by Max Duzij on Unsplash
Photo by Max Duzij on Unsplash

When I first started studying statistics, I used the concept of random variables many times without really knowing what a random variable was. We used it in hypothesis testing, model fitting, and countless other domains. In this post, we will build off of our previously established Probability foundations to understand just how important random variables are.

Let’s begin with the formal definition

(Mathematical Definition) A random variable is a mapping or function X: Ω → ℝ that assigns a real number X(ω) to each outcome ω.

Ω and ω should look familiar since we have discussed them at length in our previous posts. Nevertheless, this definition can be confusing for several reasons. First, what real numbers are we assigning each outcome to? Also, why do we assign any real number at all? Why do we need random variables?

All these questions will be answered, so just keep reading.

Let’s tie this concept back to pop culture. In a previous post, we discussed the Infinity Stones from the Marvel Cinematic Universe (MCU). These stones are central to the story that runs through the MCU’s "Phases", also referred to as the "The Infinity Saga".

Let’s start by assigning a value to each of the infinity stones that represents their overall capabilities. Let’s assume I assign the following overall scores:

  1. Soul: 6 (most powerful)
  2. Power: 5
  3. Time: 4
  4. Mind: 3
  5. Reality: 2
  6. Space: 1 (least powerful)

I think the reasons why I chose these values are for another day. We let the random variable X be the sum of scores of the two infinity stones chosen. I realize that per universe there are only six stones, so let’s also assume that we borrowed 6 identical stones from an alternate reality using the quantum realm. We know that there are 36 possible sample outcomes in our sample space Ω:

So, now we have all our ingredients in the random variable recipe, first, we described the experiment (choosing two infinity stones), and second, we gave a value to each sample outcome. This set of values X is considered a random variable. Keep in mind a random variable does not take on a single value like in algebra, it can assume any value in a given set ({2, 3, 4, 5, 6, 7, … }). It assumes any one of these values randomly, hence the name "random" variable. How does this relate back to probability? Let’s take a look at the following notation

which is "shorthand" for:

"ℙ(X = x)" simply means what is the probability that the random variable X takes on the value x. So let’s go back to our example to really understand this equation. Suppose we construct a matrix of the sum of scores after choosing two infinity stones:

What is the probability of having a power score equal to 10? Well, essentially this question is asking ℙ(X = 10) = ?. We know that

There you have it! Now we understand the relationship between random variables and probability! The example we just discussed is considered one of two types of random variables. It is referred to as a discrete random variable since it can assume a countable number of values. The second type of random variable is referred to as a continuous random variable.

Now, wouldn’t it be nice to have a function that describes the relationship between the probability of obtaining all the different values that a random variable can assume? It just so happens that there is such a function! This function is called a probability function and for a discrete random variable we refer to the function as a Probability Mass Function (PMF) and is defined by the following:

So, let’s use the fantastic language of python to better understand the power of probability functions! We won’t discuss the code in detail and we assume that most of you have a working knowledge of python.

As you can see, PMFs allow us to visualize a random variable and enable us to easily answer questions such as ℙ(X ≤ 6). This statement is essentially asking: what is the probability of choosing two stones that result in a total score less than or equal to 6? Now, if we look at our PMF figure, we can see that it is symmetrical and centered around 7. We also know that all the probabilities need to add to 1. So by simply doing the following:

We use our powerful PMF to determine that there is a 42% chance of selecting 2 infinity stones with a total score of 6 or less.

In this post, we defined a discrete random variable, provided an example, coded it, and demonstrated how the PMF can be used to visualize and study random variables.


Related Articles