
Background
Perhaps you have heard of the binomial distribution, but have you heard of its cousin the negative binomial distribution? This discrete probability distribution is applied in numerous industries such as insurance and manufacturing (mainly count-based data), hence is a useful concept for Data Scientists to understand. In this article, we will dive into this distribution and what problems it can solve.
?
To understand the negative Binomial Distribution, it’s important to gain intuition about the binomial distribution.
The binomial distribution measures the probability of measuring a certain number of successes, x, in a given number of trials, n. The trials in this case are Bernoulli trials, where every outcome is binary (success or failure). If you are unfamiliar with the binomial distribution, check out my previous post on it here:
Decoding the Binomial Distribution: A Fundamental Concept for Data Scientists
The negative binomial distribution flips this and models the number of trials, x, needed to reach a certain number of successes, r. This is why it is known as ‘negative’ because it is inadvertently modeling the number of failures before the certain number of successes.
A better way of thinking about the negative distribution is:
Probability of the "r" success happening on the "x" trial
A special case of the negative binomial distribution is the geometric distribution. This models the number of trials needed before we get our first success. You can read more about the geometric distribution here:
Key Assumptions
The following are the main assumptions of the data for the negative binomial distribution:
- Two outcomes per trial (Bernoulli)
- Each trial is independent
- The probability of success is constant
Formula & Derivation
Let’s say we have:
- p: probability of success
- 1-p: the probability of failure
- x: number trials for r success
- r: number of successes for x trials
Consequently, we must have r-1 successes in x-1 trials and the probability of this is simply the binomial distribution probability mass function (PMF):

The next bit of information we have is that the r success must occur on x trial, and it will have a probability of p. Therefore, we simply multiply the above formula by p:

That’s the negative binomial distribution’s PMF!
The mean of the distribution can be shown to be:

Derivation of the mean and standard deviation can be found here
Example Problem
What would be the probability of rolling a second 4 on the 6th roll?
- p = 1/6
- r = 2
- x = 6
Inputting these into the above PMF leads to:

So, it is quite unlikely that we will get our second 4 on the 6th roll. You can also try out some of your calculations with this negative binomial calculator.
What about if we want to know the probability of rolling our second 4 on other rolls? Well, to do this we need to plot the second roll as a function of the number of rolls, x:
import plotly.graph_objects as go
from math import comb
# Parameters
r = 2
p = 1 / 6
# PMF
def neg_binomial_pmf(x, r, p):
if x < r:
return 0
q = 1 - p
return comb(x - 1, r - 1) * (p ** r) * (q ** (x - r))
# Values
x = list(range(1, 30))
probs = [neg_binomial_pmf(k, r, p) for k in x]
# Plot
fig = go.Figure(data=[go.Bar(x=x, y=probs, marker_color='rgba(176, 224, 230)')])
fig.update_layout(title="Negative Binomial Distribution",
xaxis_title="x (number of trials to get second 4)",
yaxis_title="Probability",
template="simple_white",
font=dict(size=16),
title_x=0.5,
width=700,
height=500)
fig.show()

We see that the most likely roll to obtain our our second 4 roll 6 and 7. However, the expected value is 12 (2/(1/6)), which can be derived from the formula we showed earlier.
Applications in Data Science
Below is a list of areas where the negative binomial distribution is used:
- Time until an event: This is useful for churn models, where we want to predict when a customer may cancel their subscription. If we know when and who will churn, we can apply specialised retention strategies to try and keep the customer.
- Defect prediction: Predicting the number of defects in a manufactured product before it becomes fully functional. You can think of this as how many versions of the product to make before we reach a final proposal.
- Sports Analytics: There are several examples such as predicting after how many missed shots will a footballer score a goal. This is useful for betting companies to produce their odds.
- Marketing: Determine how many advertisements to show a customer before they convert onto a subscription or click on the website. This is predicting conversion rates.
- Epidemiology: Estimating the volume of endangered species and how the environment is affecting their numbers.
Summary & Further Thoughts
The negative binomial distribution models the probability of a certain number of failures it takes to reach a certain number of successes. This has applications in many areas of Data Science, the most notable being churn prediction. Hence, it is a useful topic for Data Scientists to understand.
The full code is available on my GitHub here:
Medium-Articles/Statistics/Distributions/negative_binomial.py at main · egorhowell/Medium-Articles
References & Further Reading
- Great video on the negative binomial distribution.
- Great summary article on the negative binomial distribution.
- Applications of the distribution.
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.