Hands-on Tutorials, MATH REFRESHER FOR DATA SCIENTISTS
If you have ever felt confused when using a probability distribution, this article is for you.

You met probability distribution many times. You know there are few different types. But deep in your heart, you feel confused when you need to use it in practice. What the hell is the difference between probability distribution and cumulative probability distribution? Should I check confidence level or alpha on the X or Y axis? If so, this article is for you. In the end, you will feel comfortable using Probability Distributions for either discrete or continuous random variables. Let’s dive into it!
We will cover the following topics in this article:
- Probability density function (PDF)
- Probability mass function (PMF)
-
Cumulative probability distribution (CDF) 3.1 Cumulative probability distribution for DISCRETE random variables (CMF) 3.2 Cumulative probability distribution for CONTINUOUS random variables (CDF)
- Summary of probability distributions
- Quantiles function
- Thanks for reading and references
1. Probability density distribution (PDF)
The probability density distribution of a normal distribution is what people mostly think about when they hear "distribution". It has a specific bell shape:

The probability density function (PDF) maps a value to its probability density [1]. It is a similar concept to physics, where the density of a substance is its mass per unit of volume. For example, 1 liter of water weighs approximately 1 kg, so the density of water is about 1 kg/L or 1000 kg/m³. Analogously, the probability density measures probability per unit of x.
PDF refers to a continuous random variable, which means that the variable can take any value within a defined range of real numbers. Random shows the uncertainty of what values the variable can take. It gives an infinite number of possibilities, for example 0.1 but also 0.101, 0.1001, etc. So, the probability that a continuous random variable will be equal to a given value is zero.
The probability on the PDF plot is represented by the area under the density curve. The area under a point equals zero. That is why the PDF is used to check the probability that a random variable falls within a given range of values, not to take any particular value. For example, what is the chance that we will lose money by investing in the fund so that the return will be negative? Here we consider all returns smaller than zero.
Intuitively, the PDF is approximately a line describing a histogram. For example, we want to divide 992 participants in the experiment into age groups (0–10, 11–20, etc). We calculate how many members fall into each group and present it as bars on the histogram:

How high are the chances that the person we randomly choose will be a member of a given age group? First, we have to convert a frequency distribution into a probability distribution. It means calculating the probability density based on the number of participants in each group. Since bars are rectangular and the area under probability density function is always equal to 1, we can use a simplified equation:

For the frequencies presented in the previous plot we have:

Now, we can plot our data using the densities instead of counts on the y-axis. The red curve connects the calculated points and denotes the probability density function:

But note that I generated data for this plot from a normal distribution. That is why PDF and histogram fit so well. PDF has a ‘closed’ form, which requires defining the distribution and parameters in advance (mean and standard deviation in case of a normal distribution). Histogram uses raw data, so it shows the real distribution. It allows detecting anomalies, especially with a high number of bars.
Interested in other parameters used to describe distribution (the expected value, variance, skewness, and kurtosis)? Jump here:
Key points to remember from the analysis above:
- The probability is the area under the probability density curve (PDF).
- The probability that a continuous random variable will take a given value is zero. So, for a specified value of x, we can only check the probability density, which is not very useful.
- That is why we focus on intervals of values. It allows us to make probabilistic statements about a range of values. For example, there is a 50% chance that the participant will be at least 40 years old.
2. Probability mass function (PMF)
Probability mass function (PMF) refers to discrete random variables. In contrast to continuous random variables, discrete random variables can only take on a countable number of discrete values such as 0, 1, 2,… . The simple examples are throwing a dice, coin tossing, or detecting fraudulent transactions (there is either fraud or no fraud).
Similarly to continuous random variables, we can create a histogram of discrete data. But there is no need to aggregate values into intervals. Let’s consider the sum of rolls of a pair of dices. The number of results is finite since the values on both dices are from 1 to 6. The plot below shows an example of a histogram for 1000 rolls of a fair pair of dices:

Both dices are fair what means the probability of rolling each number from 1 to 6 is the same, equal to 1/6. So the most popular sum is 7. Similarly to continuous random variables, we can express each result as a probability.
If we roll a pair of dices, there are 36 possible outcomes (6 options on each dice). If the sum is equal to 2, there is only one possible combination: (1,1). So the probability of getting a sum equal to 2 is 1/36 = 0.0278. Similar for the sum of 12, possible only for (6,6). We can calculate probabilities of other possible outcomes the same way. The results presented in the plot create the Probability Mass Function (PMF):

To sum up, we considered the following types of plots so far:
- A histogram is a graph describing how many times each range of values appears in the dataset. It does not require any assumptions about the distribution, but we have to specify the number of bars in advance. The histogram is plotted from a finite number of samples. The sum of the values in the histogram for all bars is equal to the total number of samples.
- The probability density function (PDF) describes the probability density of continuous random variables. The probability on the PDF is an area under the density curve. Since the probability of a given value is zero for continuous random variables, the PDF is used to check the probability that the variable falls within a given interval. The whole area under PDF is equal to one.
- Probability mass function (PMF) describes the probability of discrete random variables. It means that the variable can take on only a countable number of discrete values such as 0, 1, 2, and so on. The sum of probabilities of all discrete values in PMF is equal to one.
Although all of them are very useful and commonly used in the industry, there is another important probability distribution – Cumulative distribution function (CDF).
3. Cumulative distribution function (CDF)
The cumulative distribution function (CDF) of a random variable X describes the probability (chances) that X will take a value equal to or less than x. Mathematically we can express it as:

3.1. Cumulative distribution function of a DISCRETE probability distribution (CDF or CMF)
Taking the previous example of rolling the fair pair of dices, we can ask: what is the probability that the sum of two dices is less or equal to 3? We need to add the probability of sum equal to 2 (0.0278) and the probability of sum 3 (0.0556), so the cumulative probability for x=3 is 0.0278+0.0556=0.0834. Then, we repeat the adding process for each discrete value to obtain the cumulative distribution function of a discrete probability distribution:

As can be seen in the plot, the cumulative probability function for the highest possible outcome is equal to 1. Since the sum of two dices can only take integer values, a plot can be expressed with bars:

3.2. Cumulative distribution function of a CONTINUOUS probability distribution (CDF)
The idea of CDF for continuous variables is the same as for discrete variables. The y-axis shows the probability that X will take the values equal to or less than x. The difference is that the probability changes even with small movements on the x-axis. Considering the example with group ages of participants, the cumulative distribution function is as follows:

The plots below compare the PDF and CDF of a normal distribution with zero mean and standard deviation of one:

We can conclude that:
- CDF is a non-decreasing function. It shows the probability that the variable is equal to or less than x, so it can only go up with the increasing value of x.
- We can check the probability from both plots, but using CDF is more straightforward. CDF shows probability on the y-axis, while PDF has probability density on the y-axis. In the case of PDF, the probability is an area under the PDF curve.
- Since a normal distribution is symmetrical, CDF on x=0 (which is mean) is 0.5.
- The CDF on the left side is asymptotic to 0 and 1 on the right side of the plot. The exact values of x depend on the distribution type and parameters (mean and standard deviation for a normal distribution).
4. Summary of probability distributions
So far, we reviewed three ways to describe the probability distribution: Probability density function (PDF), Probability mass function (PMF) and Cumulative distribution function (CDF). The main difference between PDF and PMF is summarized in the table below:

The cumulative distribution function shows the probability that X will take a maximum value of x. It sums chances for all lower values and that of equal to x. Since the y-axis is a probability, usage of CDF is often more straightforward than for PDFs.
The following schema shows typical graphs of each distribution, clockwise and starting from the top left: PDF, PMF, CMF, CDF. It summarizes the high-level characteristic and describes relations between given types of distribution functions.
![Comparison of different types of distributions. Image by author inspired by [1,2].](https://towardsdatascience.com/wp-content/uploads/2021/06/1G_YsKJCKjHxv-zEktQiTag.png)
As can be seen above, there is some relation between different ways of showing probability distribution.
- For continuous random variables, we can easily plot PDF and CDF. The area under PDF is a probability, so we have to integrate to change PDF into CDF or differentiate to go from CDF to PDF.
- For discrete random variables, PMF shows the probability and CDF (CMF) the cumulative probability. To get CMF from PMF we have to add probabilities up to a given x. To go the other way round (from CMF to PMF), we have to calculate the difference between steps.
- If we divide all values into a set of bins (see examples with histograms above), we can go from PDF into kind of PMF. It uses range of values/intervals and can be considered as an approximation of PDF. To go from discrete cumulative distribution to continuous function, some form of smoothing is needed. It can be done by assuming that data comes from a specific continuous distribution, such as normal or exponential, and estimating parameters of that distribution. Changing discrete and continuous random variable in both ways should be considered as approximations.
5. Quantiles function
Let me introduce the superstar of the distributions – quantile function. It allows using distributions for many practical purposes, such as looking for confidence intervals and hypothesis testing.
Math definition is that the quantile function is the inverse of the distribution function at α. It specifies the value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability:

Where F⁻¹(α) denotes the α quantile of X.
It may sound a little mysterious now, but a closer look into it will dispel doubts. Assume that we want to check 5% of the total area in the lower tail of the distribution. We call it the lower 5% quantile of X and write it as F⁻¹(0.05). Quantile is where probability distribution is divided into areas of equal probability. If we consider percentages, we first divide the distribution into 100 pieces. When we look into PDF, the 5th quantile is the point that cuts off an area of 5% in the lower tail of the distribution:

The area under PDF on the left from the red line is exactly 5% of the total area under the curve. It implies a probability of 5%. The first step to drawing the red line was calculating where ends 0.05 of the total area (here x=-1.645). It can be done by the software (e.g. qnorm() function in R or scipy.stats.norm.ppf() in Python) or manually using z-tables (an example here).
Since CDF has probability (α) on the y-axis, it is easier to find this value here:

This shows how useful are CDF plots. We can use CDFs both ways:
- If we have a z-value (or x-value, value on the x-axis), we can check the probability that X will take a value equal to or less than x. For example, what is the chance that the average length the client spends in the online shop is half an hour or shorter?
- If we have probability, we can check the value that cuts off an area of a given alpha. For example, with 90% confidence, we can say that client spends in the online shops at least X hours.
In the example above, we considered only one-sided 5% quantile (lower tail). We can do the same for 5% probability on two sides. It means we look for 5% of the total area under PDF but divided into 2.5% lower quantile (on the left side) and 2.5% upper quantile (on the right side of the plot).

So, quantiles are a direct connection between those plots.
Based on the plots, we could say that we have 95% confidence that the true parameter (mean) lies between -1.96 and 1.96. Or that there is a 5% chance that it lies outside of the range from -1.96 to 1.96.
The interpretation above highlights that:
- the confidence level tells us how probable is a considered event or what are the chances that the given parameter is inside a given range of values.
- alpha or significance level is a probability. We can check it on the y-axis on the CDF plot. Alpha is one minus confidence level.
Few things to note:
- The inverse function Φ⁻¹(α) is the α-quantile
- When α is small the quantile is also called a critical value
- Some quantiles have special names. If we divide the probability by 100 pieces, we have percentiles. We can say the 5th percentile instead of the 5% quantile. The 4-quantiles are called quartiles and they divide into 4 pieces with the breaks on values 25%, 50% (median), and 75%.
- For the standard normal distribution (a normal distribution with zero mean and standard deviation of one N(0,1)), which is symmetric about zero, we have:

This is proved in the plots above, since we get -1.96 on the lower tail and 1.96 on the upper tail.
Using quantiles, PDFs, CDFs, we can answer different questions depending on the information we own, for example:
- Considering the sample mean, what is the range of values containing the population mean that we are reasonably confident? "Reasonably" may take various percentage values and depends on the goal of our study.
- With what degree of confidence can we say that the returns will not be negative?
Thanks for reading!
I am glad you reached the end of this article. We went through different types of probability distributions: probability density function (PDF), probability mass function (PMF), and cumulative density function (CDF). Then, we discussed the quantities function. It links different ways of describing distributions (PDF vs CDF) and allows us to use those distribution in a very practical way. I hope it was an exciting journey for you.
Remember that the most efficient way to learn (math) skills is by practice. So don’t wait until you feel ‘ready’, just grab a pen and paper (or your favourite software) and try few examples on your own. I keep my fingers crossed for you.
I will be happy to hear your thoughts and questions in the comments section below, by reaching me directly via my LinkedIn profile or at [email protected]. See you soon!
You may also like:
Statistical Moments in Data Science interviews
References
[1] A.B. Downey: "Think Stats. Exploratory Data Analysis in Python"
[2] C. Alexander (2008): "Market Risk Analysis. Vol. I. Quantitative methods in finance". John Wiley & Sons Ltd, ISBN 978–0–470–99800–7.