MATH REFRESHER FOR DATA SCIENTISTS
You know that probability distributions are important and that binomial Distribution is a basic one. But you still have some doubts about it. When should I use it? How to understand equations for probability mass function and cumulative density function? What are the handy functions in R and Python? In the end, you will feel comfortable using binomial distributions from both theoretical and practical perspectives. Let’s dive into it!
We will cover the following topics in this article:
- Bernoulli trial
-
Binomial distribution 2.1 The binomial coefficient 2.2 The binomial density funtion (PMF) 2.3 The cumulative density funtion (CDF)
- The behavior of binoimal distribution
-
Binomial distribution functions in R and Python 4.1 R 4.2 Python
- Binomial distribution in practice
1. Bernoulli trial
A Bernoulli trial is a random experiment that has exactly two possible outcomes, typically denoted as "success" (1) and "failure" (0). "Success" meets given criteria, for example, a number higher than 7, female, age below 10, negative return, etc. It does not mean that the outcome is "good" in the ethical meaning of that word.
If q is a probability of success and p is the probability of failure, then:

Since there is no other option to choose than 0 or 1, the sum of probabilities of success and failure is always equal to 1.
2. Binomial distribution
Let us first have a look at the definition. A binomial distribution describes the distribution of the number of successes in N independent Bernoulli trials, where the probability of success is constant, p [1]. In plain English, it means:
- We repeat the same experiment many (N) times. For example, toss a coin N=1000 times.
- In each trial, exactly two outcomes are possible: head and tail (success and failure, Bernoulli trial). So, the Bernoulli trial in this definition narrows binomial distribution to a discrete distribution, where the result of an experiment can ONLY take 0 or 1. Anything between, such as 0.3, 0.01, or neither success nor failure, is not possible.
- In each trial, the probability of getting head is the same (p=0.5). It does not depend on any of the previous results, so it is independentN .
2.1. The binomial coefficient
To express the binomial density function, we first need to define the binomial coefficient:

We can read it as "N choose m". It describes the number of ways that X can take a given value m. It selects m items from a collection when the order of selection does not matter. If the set has N elements, the number of m-combinations equals the binomial coefficient [2].
Let’s see the example. In lottery, five balls are chosen from the pool of twenty balls. How many lottery tickets do you need to buy to ensure a win? At first we have 20 balls to choose from, 19 for the second choice, 18 for third, etc. So, we have

different ways to choose five balls out of pool with twenty balls. But we do not care about the order in the lottery, just having the same numbers matters. This means that {A,B,C,D,E} order of balls and {B,E,A,C,D} is the same set of balls for us.
So we need to reduce the 1.860.480 ways to choose by duplicates (the number of ordering ways for selected balls). Since we have five balls, we can do it in 5!=120 different ways. Finally, we divide the 1.860.480 ways to choose by 120 ordering options and get 15.504 tickets to buy!

Is it worth it?
Let’s go back to our binomial coefficient. In a numerator, we see the number of ways we can order the pool of balls 20! (N!). In a denominator, we reduce the outcome by the ordering options:
- ways of ordering 5! selected balls (m!), and
- the possibilities to order the balls remaining in the pool (N-m)!, here 15!
Like the order of selected balls, we do not care about the order of balls that remain unselected. Finally, we get:

Exactly the same as we obtained above but in an easier way.
The ‘!’ notation is the factorial. As you might see, for non-negative integer x, it is calculated as the multiplication of all numbers up to x, for example:

2.2. The binomial density funtion (PMF)
Now, we are ready to define the binomial density function as a probability of obtaining m successes in N Bernoulli trails:

So, the binomial distribution is a discrete probability distribution of the number of successes (m) in a sequence of N independent repetitions of a given experiment, which asks yes-no (success-failure, 1–0) question, the probability of success is p, and the failure’s probability is q=1-p.
Let’s consider an example. You are practicing free throws during basketball training. From the season Statistics, we know that the probability that you will score a point is 75%. Your coach told you that if you score 17 points out of 20 attempts, you will start for the next match. What is the probability that you score exactly 17 points?
We need to assume that the probability of a successful free throw shot is independent of the previous result (the mental strength does not play a role here). We also do not care about the order of scoring, such that it does not matter whether you fail first, third or last shot. Thus, this is a binomial distribution. We can use the binomial density function as given above and get:

We can repeat this excercise for other scores. As a result, we get the binomial distribution plot (PDF):

But you want to score at least 17 points, not exactly 17 points. So what is the probability that you will be in starting lineup for the next match?
2.3. The cumulative density funtion (CDF)
Here, a cumulative distribution function of a binomial distribution will help to answer this question. If you wonder why, please check this article first:
Quantiles are key to understanding probability distributions
The cumulative distribution function (CDF) describes the probability (chance) that X will take a value equal to or less than k. The CDF function for the binomial distribution is as follows:

where [k] is the "floor" under k, i.e. the greatest integer equal to or less than k.
So, we need to sum the probabilities that you will score 17, 18, 19, or 20 free throw shots, as marked in red in the PMF plot:


Since the probabilities sum to 1, we could also take the opposite approach – subtract cumulative probability of scoring maximum 16 points (all left from 17) from 1:

This is shown in the CDF plot:

The result from both approaches matches. The task given by your coach is challenging but doable considering probability, so give it a try!
3. The behavior of binoimal distribution
How do the chances change if you get more attempts from your coach? You still have to achieve 85% of successful throws but you received 20, 50, or 100 trials. Let’s see the next plot.

The higher the sample size, the wider the distribution. The probability od getting at least 85% of successfull free throws is as follows:

Considering the results in the table, you should not push your trainer to give you additional trials. The more attempts, the less likely is a better score than your long-term probability of success.
Next, let’s check if the hard training enhances your chances. We consider three probabilities of success: 20%, 50%, and 90%. The number of trails remains constant (100).

The following table summarizes probabilities of 85% successful throws given your 75% long-time probability of success.

There is much more sense to train hard than fighting with the coach about getting more trails!
4. Binomial distribution functions in R and Python
Let’s take a closer look at functions in R and Python that help to work with a binomial distribution.
4.1. R
At least those four functions are worth knowing in R. In the following examples, m is the number of successful trials, N is the size of the sample (number of all attempts), p is the probability of success.
- dbinom(m,N,p): this function calculates the probability of having exactly m successes in N random trails with p probability of success. For example, 17 scores out of 20 throws give dbinom(17,20,0.75). If we calculate dbinom for each available score, we can draw a PMF plot (as shown in the example above).

- pbinom(m,N,p): this function calculates the cumulative probability function so that the number of successes is equal to or lower than m. For our example with at least 17 scores, it is 1-pbinom(16,20,0.75) or pbinom(16,20,0.75,lower.tail=FALSE). If we add lower.tail=FALSE, we consider the probability of getting at least m successes. Default is lower.tail=TRUE.

- qbinom(probability P, N,p) returns the value of the inverse cumulative density function. It finds the Pth quantile of the binomial distribution. How many successful scores do you need to get given probability? qbinom(0.55, size=20,prob = 0.75) gives 15. It means that there is a 55% chance that you score a maximum of 15 points.
- rbinom (m,N,p) is to generate a vector of binomial distributed random variables. It is useful for training purposes.
4.2. Python
Those functions from scipy.stats.binom library are worth knowing in Python. In the following examples, m is the number of successful trials, N is the size of the sample (number of all attempts), p is the probability of success.
- pmf(m,N,p): this function calculates the probability mass function, so the probability of having exactly m successes in N random trails with p probability of success. For example, 17 scores out of 20 throws give scipy.stats.binom.pmf(17,20,0.75). If we calculate dbinom for each available score, we can plot the PMF.


- cdf(m,N,p) allows calculating the cumulative distribution function, so that the number of successes is equal to or lower than m. For our example with at least 17 scores, it is: 1-scipy.stats.binom.cdf(16,20,0.75).


- ppf(m,N,p) this function gives percent point function, so it returns the value of the inverse cumulative density function. It finds the Pth quantile of the binomial distribution. How many successful scores do you need to get given probability? scipy.stats.binom.ppf(0.55, size=20,prob = 0.75) gives 15. It means that there is a 55% chance that you score a maximum of 15 points. We can also check the accuracy of PFM and CDF:

- stats(N,p,moments=’mvsk’) allows to calculate the first four moments, i.e. mean(‘m’), variance(‘v’), skew(‘s’), kurtosis(‘k’). If you need a refresher on statistical moments, this can be an article for you:

- rvs (n,p,size=) is to generate a vector of binomial distributed random variables. It is useful for training purposes.
5. Binomial distribution in practice
The binomial distribution is a basis for the binomial test of statistical significance. It aims to check whether the result of an experiment with only two possible outcomes is statistically different from what is expected. For example, you expect that the return from your investment will be above -100$ for 95% of the time. Then, you check the actual returns for the last month (22 trading days) and you get the following results:

Can we say that the results are different from our expectations? First, we need to assume that the returns are independent for each day. Then, we define the null hypothesis for the test as the results of an experiment do not differ significantly from what is expected. We want to check if the real data supports the alternative that the true probability of success (here the return below -100$) is greater than 0.05. Given there are only success (below -100$) and failure (above or equal to -100$) options for each day, we can use the CDF equation given above:

Considering a 5% alpha level, we cannot reject the null hypothesis that the results are not significantly different than expected. So, even if the number of exceptions is higher than expected, it is a good strategy to hold our nerves and do not overreact to the market movements.
Thanks for reading!
I am glad you reached the end of this article. We went through most important topics related to the binomial distribution. I hope it was an exciting journey for you!
I will be happy to hear your thoughts and questions in the comments section below, by reaching me directly via my LinkedIn profile or at [email protected]. See you soon!
You may also like:
Quantiles are key to understanding probability distributions
Statistical Moments in Data Science interviews
References
[1] C. Alexander (2008): "Market Risk Analysis. Vol I. Quantitative Methods in Finance", John Wiley & Sons Ltd, ISBN 978–0–470–99800–7
[2] Combination in Wikipedia: https://en.wikipedia.org/wiki/Combination
[3] A.B. Downey: "Think Stats. Exploratory Data Analysis in Python"