The world’s leading publication for data science, AI, and ML professionals.

Dwight Schrute Teaches Us About Discrete Probability Distributions

An article that is not about the normal distribution

Meme made by author using imgflip.com/memegenerator/Dwight-Schrute
Meme made by author using imgflip.com/memegenerator/Dwight-Schrute

Introduction

With all the fanfare surrounding the normal distribution, newcomers to Data Science can make the mistake in thinking that data scientists care only about the normal distribution.

While the normal distribution is arguably the most important (and most perversely used) probability distribution, it is not always the best tool, nor the best assumption to make, in solving business problems.

Say that you work at Dunder Mifflin Paper Company. Assistant (to the) Regional Manager Dwight K. Schrute gives you files on one-hundred clients. Mr. Schrute wants to know the following:

  1. Of the one-hundred clients, what is the Probability that 5 of the clients will accept a sale from Dunder Mifflin, given that the probability of closing a sale is 10%?
  2. How many clients can we expect to ignore Dunder Mifflin before the sales department makes its first sale?
  3. We want to give a Scrantonicity album to 5 clients who purchase from Dunder Mifflin. What is the probability that the tenth client gets the last album?

Thanks to the Central Limit Theorem, we could use the normal distribution to answer all the above questions. However, we’d likely need more information than what Dwight gives us to answer at least one of them.

A better, and simpler, solution would be to use Probability Distributions for discrete random variables.

In what follows, I explain three probability distributions for discrete random variables: the binomial distribution, the geometric distribution, and the negative binomial distribution.

Probability Distributions for Discrete Random Variables

Discrete random variables are random variables that take on an integer value. This is in contrast to continuous random variables, whose values include all real numbers (some conditions may apply).

Let’s take a look at our first probability distribution.

Binomial Distribution

The binomial distribution approaches the normal distribution as the sample size grows.
The binomial distribution approaches the normal distribution as the sample size grows.

Consider the set of n independent events

The probability of success, p, is identical for each event (or trial).

Let Y be the total successes among the n independent events. Then, we say that Y is a random variable that follows a binomial distribution.

The probability of y successes among the n trials __ is

The expected value of Y is

E[Y] = np

and the variance is

Var[Y] = np(1-p)

Example

We can use the binomial distribution to answer Dwight’s first question.

Of the one-hundred clients, what is the probability that 5 of the clients accept a sale from Dunder Mifflin, given that the probability of closing a sale is 10%?

We have that

  1. p = 0.10
  2. n = 100
  3. y = 5

Thus, our probability mass function becomes

Our probability is approximately 3.39%.

Now, what if we wanted to know the probability that at least 5 clients closed a sale with Dunder Mifflin?

If n remains constant, then

Equivalently,

The above calculation reveals that the probability that more than five clients close a call with Dunder Mifflin is 97.63%.

Geometric Distribution

Suppose, again, that we have a set of n independent events that share the same probability of success.

Let Z be the number of failures, k, prior to the first success.

Then, Z has a geometric distribution with the probability mass function

which gives us the probability of k trials prior to the first success.

The expectation and variance are

E[Z] = 1/p, and

Var[z] = (1 – p)/p²,

respectively.

Example

We can use the geometric distribution to answer Dwight’s second question.

How many clients can we expect to ignore Dunder Mifflin before the sales department makes its first sale?

Because p = 0.10,

E[Z] = 1/0.10

E[Z] = 10

Therefore, we can expect 10 clients to ignore Dunder Mifflin before the sales department makes it first sale.

Negative Binomial Distribution

Let r be the number of successes.

The negative binomial distribution gives us the probability of having the _r_th success after k failures.

More specifically, we want to find the probability, first, of having r – 1 successes on k + r – 1 trials, and second, of having a success on trial k + r.

For example, say that r = 3 and k = 10. Then, we want to find the probability of having two successes on twelve trials, and then a third success on the thirteenth trial. [Note: we are not estimating the probability of having three successes and ten failures. We are estimating the probability that the third success happens on the thirteenth trial].

Let W be the number of trials before the _r_th success. The probability mass function for the negative binomial distribution is

where k is the number of failures and r— 1 __ is the number of successes. Therefore, k+ r— 1 is the number of trials.

The expectation and variance are

E[W] = r(1 – p)/p, and

Var[W] = r(1 – p)/p²,

respectively.

Example

We can use the negative binomial distribution to answer Dwight’s third question:

We want to give a Scrantonicity album to 5 clients who purchase from Dunder Mifflin. What is the probability that the tenth client gets the last album?

We also have that r = 5 and k = 5.

Therefore,

which gives us 0.00744, or 0.744%

Notes

The negative binomial distribution has alternative definitions. In this article, I define the negative binomial distribution as

the probability of having the _r_th success after k failures.

However, we can also define the negative binomial distribution as

  1. the probability of having the _r_th success after k failures
  2. the probability of having the _k_th failure after r successes.

These definitions produce alternative equations for the mean.

If we use the first definition, then our mean becomes

And if we use the second definition, then our mean becomes

Be sure to know what information your supervisor wants before calculating expectations!

Conclusion

In this article, I’ve gone over three discrete probability distributions:

  1. The Binomial Distribution
  2. The Geometric Distribution
  3. The Negative Binomial Distribution

Obviously, there are many more – Poisson, hypergeometric, multinomial just to name a few.

You don’t need to know every distribution – I certainly don’t! – but the more you know, the more tools you have to solve business problems.


Related Articles