The world’s leading publication for data science, AI, and ML professionals.

Key Concepts to Improve Your Understanding of Probability Theory

Six notions I learned during the MITx probability course

Photo by Mick Haupt on Unsplash
Photo by Mick Haupt on Unsplash

If I had to summarise what I learned in the field of Probability during the MITx Micromaster program in Statistics and Data Science, what would it be?

What I want to bring here, is a small list of topics that I found particularly interesting and well explained in the lecture Probability – The Science of Uncertainty and Data.

I will not bother you with the whole curriculum, but I would like to summarise six concepts that have great importance in the world of probability.

Let’s speak about the following topics:

  • Compact writing convention
  • Sampling Table
  • Thinking Conditionally
  • Transformation
  • Independent and identically distributed random variables
  • Law of Large Number and Central limit theorem

Compact writing convention

Why is it important?

A naming convention will not help you understand probability theory, but it will certainly help you master and memorize some formulas.

The simpler the better. The way formulas are written in the MITx lecture makes it clear and compact. For example, each random variable is written in capital letter, density probability functions in lowercase, conditioning with a vertical bar, and so on and so forth.

For instance, the function below written in seven-character would tell you the following: "the probability density function "f" of the random variable "X" at sample point x knowing that the event "A" occurred."

Compact and clear.


Sampling Table

Quickly jump to this table when you try to answer a question such as "How many samples of size k could I get?".

For example, a bucket filled with three poker chips labeled "1, 2, and 3" can be drawn in four different ways. Let’s have a look at them:

  • "With replacement; Order matter": with replacement means that we can draw the same ball multiple times, and when the order matters, a sample such as "1,1, 3" would be considered different than if we had drawn "3, 1, 1".
  • "With replacement; Order doesn’t matter": as before, but if we draw the balls with order "1, 1, 3" or "3, 1,1" it doesn’t matter. It will be considered a unique sample.
  • "Without replacement; Order matter": without replacement means that we can’t draw a ball multiple times, i.e. a sample such as "1,1,2" isn’t possible.
  • "Without replacement; Order doesn’t matter": in this scenario, we can’t draw a ball multiple times and a sample such as "1,2,3" would be considered the same sample as "3,2,1"

We can now summarise all four scenarios in one table and calculate how many different samples we can create by taking k chips from a bucket filled with n different pieces. The table looks as follow:


Thinking Conditionally

To support this point, I’ll take Bayes’s rule that gives us the conditional probability of an event (A) given that another event (B) has already occurred.

What if we consider that another event (C) already occurred before the two other events? Well, in that case, we can calculate the probability of A given B and C as follows:

Interesting to see how we can add event C in this formula. When we "think conditional", we project ourselves in a universe where we know that some events already occurred.

Another example would be the concept of independence. When two variables (A and B) are independent given C, we get the following equality:


Transformation

What if I know the distribution of a random variable X, but I would like to know the distribution of Y=g(X)?

Well, you’ll be glad to hear that there is a general "recipe" to get to know the distribution of variable Y if you know how X is distributed.

Let’s have a look at this three-step recipe:

  1. Compute the inverse function of g(X). This means that you transform the mapping Y=g(X) in order to get X=h(Y)
  2. Compute the Cumulative distribution function (CDF) P(y<h(y))
  3. Take the derivative of your CDF to get your density function

Hopefully, you don’t have to always do it by yourself. If the function g(X) is differentiable and strictly increasing or strictly decreasing, you can directly apply the following formula:

The interesting thing is that you can also apply the same principle if you have a function with multiple random variables such as Z = g(X, Y). Moreover, if your function is a simple addition (g(X, Y) = X + Y), you’ll end up with a convolution that looks as follows:


Independent and identically distributed random variables

Those two concepts are key in Probability theory as they are the fundamental conditions to apply the central limit theorem.

Let’s start with the notion of independence with an example: if we consider a first event "A" such as "getting a fair die when buying it in the supermarket" and event "B" such as "getting number six when throwing the die that I just bought". The two variables are mutually dependent because the fairness of the die will influence the probability of getting the number six when throwing it.

We would say that two random variables are independent if (and only if) one of the following statements holds:

And that’s about it for the concept of independence. Let’s now discuss the notion of "identically distributed".

For this concept, two random variables are identically distributed if their probability functions are exactly the same. For example, if "X1" is a random variable that describes the result that we get when throwing a die and "X2" describes the result that we get when throwing the same die a second time. Both random variables have exactly the same probability distributions.


Law of Large Number and Central limit theorem

Those concepts probably belong to the most taught principles in Probability and Statistics. Why is it so important?

Well, to answer that, I’ll quote Professor Philippe Rigollet:

Statistics is 99% doing averages and the rest 1% is doing something fancy.

So lets’ start with the intuitive one: the Law of Large Number (LLN). The idea is that you average "n" random variables that are independents of one another and that follow the same probability distribution. In other words, your average "n" independent variables have a similar expectation (let’s say μ). So you should probably expect that the expectation of your average is μ as well.

And that’s exactly what the LLN tells you. When "n" tends to infinity, the expectation of the average below is μ as well.

The Central Limit Theorem (CLT) is less intuitive but equally important. It will tell you how your average is distributed.

The surprising thing is that it doesn’t depend on the distribution of your variables (X1, …, Xn). It will always converge to a normal distribution.

Alright, but how do I get an infinite amount of data points in practice?

No worries, you don’t need to get so many data points.

In his lecture Fundamental of Statistics, Professor Philippe Rigollet explains that when "n" is larger than 30 you will already get a good estimate of the CLT. He also added that to estimate something that is a life-or-death situation, we would only have to pick "n" larger than 50.


Final Word

We saw together the six concepts I found especially interesting in the lecture Probability – The Science of Uncertainty and Data that is given online by MIT.

Probability theory is a vast subject, and I gave you the concepts that I was glad to discover early when tackling this subject.

To summarise, we saw the following concepts:

  • Compact writing convention is crucial to be more efficient when writing probability formulas. I’m also convinced that it will help you memorize them.
  • We can have a quick look at our "Sampling Table" whenever we have a question such as "How many samples of size k could I get ?".
  • The chapters "Thinking Conditionally" and "Transformation" can help us manipulate density functions.
  • The concept of "Independent and identically distributed random variables" was fundamental to applying the law of large number and central limit theorem.

Curious to learn more about Anthony’s work and projects? Follow him on Medium, and LinkedIn.

Need a technical writer? Send your request to https://amigocci.io.


Related Articles