
Table of Content
- Introduction
- Set Theory
- Probability
- Counting Techniques
- Conditional Probability
- Bayes Theorem
Introduction
Understanding the fundamentals of probability will go a long way whether you’re pursuing a Data Science career or not. It’s good knowledge to have regardless. My goal is to provide a comprehensive crash course of the basics of probability that you should know so that your data science journey (or journey in general) moving forward is paved more smoothly.
With that said, let’s dive into it!
Set Theory
Terms
- A set is a collection of elements.
- More specifically, a sample space (denoted as S) ** is the set of al**l possible outcomes for a given experiment. For example, the sample space of all possible outcomes of a die (singular of dice) is S = {1, 2, 3, 4, 5, 6}.
- A finite sample space is a sample space that has n distinct elements less than infinity.
- A simple sample space (SSS) is a finite sample space in which all outcomes are equally likely.
- The universal set, denoted as U, is the set of everything.
- The empty set, denoted as ∅, is a set with nothing in it. It’s… empty.
- Cardinality is defined as the number of elements in set A, denoted as |A|.
- Set A is a subset of set B if everything element in set A is also in set B. The notational equivalence is A ⊆ B.
- An event is a set of possible outcomes. Any subset of S is an event.

- The complement of a given set A is the set that has all elements that set A doesn’t have and is denoted as A’. See image below.

- The intersection of set A and set B are the elements that are in both set A and set B, and is denoted as A⋂B. See image below.

- The union of set A and set B are the elements that are in set A or set B, and is denoted as A⋃B. See image below.

Set A and set B are said to be disjoint if the intersection of A and B is equal to the empty set (A⋂B=∅).

Law of Set Operations
Complements
- A⋃A’ = U (The union of A and complement A is equal to the universal set)
- A⋂A’ = ∅ (The intersection of A and complement A is equal to the empty set)
- A” = A (The complement of the complement of A is equal to A)
Communative Law
- A⋃B = B⋃A (The union of A and B is equal to the union of B and A)
- A⋂B = B⋂A (The intersection of A and B is equal to the intersection of B and A)
Morgan’s Law
- (A⋃B)’ = A’⋂B’ (The complement of A union B is equal to the complement of A intersect the complement of B)
- (A⋂B)’ = A’⋃B’ (The complement of A intersect B is equal to the complement of A union the complement of B)
Associative Law
- A⋃(B⋃C) = (A⋃B)⋃C = A⋃B⋃C
- A⋂(B⋂C) = (A⋂B)⋂C = A⋂B⋂C
Distributive Law
- A⋃(B⋂C) = (A⋃B)⋂(A⋃C)
- A⋂(B⋃C) = (A⋂B)⋃(A⋂C)
Probability
For every event A which is a subset of the sample space S, there is a Probability of A, denoted as P(A).
Axioms of Probability
- 0 ≤ P(A) ≤ 1
- P(S) = 1
- If A⋂B = ∅ then P(A⋃B) = P(A) + P(B)
- Suppose there are k events that are disjoint. The probability of the union of all k events is equal to the sum of each individual probability.
Properties of Probability
Note: I’m not going to go through the proofs, so feel free to Google the proofs or DM me on LinkedIn if you would like to know the proofs.
- P(∅) = 0
- P(A’) = 1 – P(A)
- For any 2 events A and B, P(A⋃B) = P(A) + P(B) – P(A⋂B)
- For any 3 events A, B, C, P(A⋃B⋃C) = P(A) + P(B) + P(C) -P(A⋂B) – P(B⋂C) – P(A⋂C) + P(A⋂B⋂C)
- If A is a subset of B, then P(A) ≤ P(B)
Counting Techniques
Note: These techniques are used strictly for simple sample spaces (SSS).
Addition Rule
Definition: If there are n ways to do something and m ways of doing another thing and you cannot do them at the same time, then there are n+m ways to choose one thing to do.
For example, there are 5 ice cream flavors to choose from and 4 frozen yogurt flavors to choose from and you can only choose one then you have 9 options to choose from (4+5).
This can be extended beyond 1 operation.
Multiplication Rule
Definition: If there are n ways to do something and m ways of doing another thing, then there are n*m ways of performing both actions where one thing is performed before another.
For example, if there are 2 different ways of getting from Canada to the United States and 4 different ways of getting from the United States to Mexico, then there are 8 different ways of getting to Mexico from Canada (2*4).
This can be extended beyond 1 operation.
Permutations
Definition: A permutation of n elements is any arrangement of those n elements in a definite order. There are n factorial (n!) ways to arrange n elements. Note the bold: order matters!
The number of permutations of n things taken r-at-a-time is defined as the number of r-tuples that can be taken from n different elements and is equal to the following equation:

Example: How many permutations does a license plate have with 6 digits?

Combinations
Definition: The number of ways to choose r out of n objects where order doesn’t matter.
The number of combinations of n things taken r-at-a-time is defined as the number of subsets with r elements of a set with n elements and is equal to the following equation:

Example: How many ways can you draw 6 cards from a deck of 52 cards?

Conditional Probability
Conditional probability is the probability of one event occurring given that another event has already occurred. Formally, if P(B) > 0, then the conditional probability of A given B is equal to the following equation:

Given this equation, we can deduce the following equation…


Properties
- 0 ≤ P(A|B) ≤ 1
- P(S|B) = 1
- If A1⋂A2 = ∅ → P(A1⋃A2|B) = P(A1|B) + P(A2|B)
Independence
- A and B are said to be independent of each other if they’re unrelated to each other.
- A and B are independent if and only if P(A⋂B) = P(A)P(B)
- If P(B) > 0 and A and B are independent then P(A|B) = P(A)
Bayes Theorem
The Law of Total Probability
The Law of Total Probability is as follows.

You’ve actually seen this in effect in one of the examples above. (Remember the image below?)

Consider the Law of Total Probability as well as the equation below:

Considering these two equations, let’s look at Bayes Theorem.
Bayes Theorem
If n events A form a partition of S and B is any event, then:

One of the main applications of Bayes Theorem in Data Science is the Naive Bayes classifier. Check out my article, A Mathematical Explanation of Naive Bayes in 5 Minutes, if you would like to learn more!
Thanks for Reading!
Terence Shin
Founder of ShinTwin | Let’s connect on LinkedIn | Project Portfolio is here.