Probability Theory Continued: Infusing Law of Total Probability With Kolmogorov’s Definition of Joint Probability

Demystifying an aspect of probability theory through an extension of Bayesian statistics

Michael Knight
Towards Data Science

--

Source

In my previous blog, I simplified Bayes’ Theorem. In this blog I will translate the law of total probability, which states:

For a countable set of x events [B₁, B₂, …, Bₓ] (where there are ‘x’ amount of events categorized as ‘B’), and that event A only occurs given an event B occurring, then the probability of event A, given the entire set of B events is P(A) = ∑P(A∩Bₓ)

Again, as with Bayes’ Theorem, this seems very convoluted and overwhelming at first glance. But upon breaking it down into smaller steps, it is not too complicated (and actually rather intuitive). The probability that event A occurs is equal to the sum the probabilities of A occurring exclusively when each possible condition for A to occur also occurs. Or to visualize:

source

Note that in the above image A only occurs at B₂, B₃, B₄, and B₅, so P(A∩B₁) and P(A∩B₆) both are 0, so in this example we have:

P(A) = ∑P(A∩Bₓ) = 0 + P(A∩B₂) + P(A∩B₃) + P(A∩B₄) + P(A∩B₅) + 0

When thinking of it like that, the equation can seem confusing because it sounds too easy or obvious; it sounds like it should be a trivial point. The most useful extension of this law is probably when we apply the Kolmogorov definition of joint probability, which states that

P(A∩B) = P(A|B)*P(B)

When we apply this to the Law of Total Probability, we see that

P(A) = ∑P(A∩Bₓ) = ∑ P(A|Bₓ)*P(Bₓ)

The best way to understand why we would need to have this formula (and how it would be used) is to look at an example (which I have taken directly from a lab assignment from the DSI program hosted by General Assembly):

Suppose you and your friend are playing a game. Your friend has laid four coins out in front of you. If you flip heads, you win a dollar from your friend. If you flip tails, you owe a dollar to your friend. However, the coins in front of you are not fair.

  • One coin has a 80% chance of flipping heads. (Call this coin A.)
  • One coin has a 60% chance of flipping heads. (Call this coin B.)
  • One coin has a 40% chance of flipping heads. (Call this coin C.)
  • One coin has a 10% chance of flipping heads. (Call this coin D.)

Suppose you select one coin at random. That is, you don’t know whether you selected coin A, B, C, or D. You flip heads. Given this data, what are the probabilities that you selected coin A, coin B, coin C, and coin D?

source

So first off, let’s try to put what we have just been told in to a format that could be read into a probability formula: we are told that, given coin A, there is an 80% chance of flipping heads, which can be interpreted as

P(H|A) = 0.8

we are told that, given coin B, there is an 60% chance of flipping heads, which can be interpreted

as P(H|B) = 0.6,

we are told that, given coin C, there is an 40% chance of flipping heads, which can be interpreted as

P(H|C) = 0.4,

and we are told that, given coin D, there is an 80% chance of flipping heads, which can be interpreted as

P(H|D) = 0.1

Since there are 4 coins, and no reason why there wouldn’t be any difference in probability between picking up one coin from another, we can deduce that there is an equal probability in picking up each coin, which would be a one-in-four (1/4) probability. Thus:

P(A) = P(B) = P(C) = P(D) = 0.25

To solve the probability of flipping a heads, you can plug all of this into the formula for the Law of Total Probability:

P(H) = P(H|A) * P(A) + P(H|B) * P(B) + P(H|C) * P(C) + P(H|D) * P(D)

= (.8 * .25) + (. 6* .25) + (.4 * .25) + (.1 * .25)

= 0.475

From here we can use Bayes’ Theorem to solve the rest of the problem:

P(A|H) = (P(H|A) * P(A)) / P(H) = (.8 * .25)/.475 = 0.42105263
P(B|H) = (P(H|B) * P(B)) / P(H) = (.6 * .25)/.475 = 0.31578947
P(C|H) = (P(H|C) * P(C)) / P(H) = (.4 * .25)/.475 = 0.21052632
P(D|H) = (P(H|D) * P(D)) / P(H) = (.1 * .25)/.475 = 0.05263158

Thus, if you flipped a heads there is a 42.1% chance that you selected coin A, a 31.6% chance that you selected coin B, a 21.1% chance that you selected coin C, and a 5.3% chance that you selected coin D.

Through this example, you can see how (once Kolmogorov’s definition of joint probability is applied) that the Law of Total Probability is far from trivial, and when interpreted step by step is not so hard to understand. It is also neat (at least I think) to see these three different rules (Law of Total Probability, Kolmogorov’s definition, and Bayes’ Theorem) all used together to find more relations and correlations, and you can begin to see the potential for creative thought within the field of mathematics.

--

--

Data Science | Machine Learning | Natural Language Processing | Number Theory