The world’s leading publication for data science, AI, and ML professionals.

Basics of Probability Notations

Union, Intersection, Independence, Disjoint, Complement: Advanced Probability for Data Science Series (1)

Photo by Martin Woortman on Unsplash
Photo by Martin Woortman on Unsplash

If you are not a paid member on Medium, I make my stories available for free: Friends link

If you’ve been following my previous articles in the probability series, you may have noticed that I briefly touched on concepts like probability notations before diving into Bayes’ theorem.

I took some time to look back at my articles and realized that I didn’t go deeply into the foundational notations that set the basis for all probability calculations such as the Union, Intersection, Independence, Disjoint, etc.

These notations aren’t just something that should be brushed over because they are super important in all things related to Data. Especially in fields like data analysis, machine learning, and statistical modeling.

This realization led me to think: before jumping headfirst into advanced topics like Conditional Probability, Conditional Independence, Bayes’ Theorem, Markov Chains, or Monte Carlo methods, it’s crucial to have a solid understanding of the basics.

Without this foundation, advanced probability topics could feel overwhelming and disconnected.

That’s why I’m taking a step back to give some better explanations to probability notations. But don’t worry, like most of my articles, it won’t be just theory! I’ll walk you through clear examples and practical scenarios to help you understand.

Let’s get started!


Table of Contents

  1. Union: P(A U B)
  2. Intersection: P(A ∩ B)
  3. Independence and Disjoint
  4. Complement (Aᶜ) and Difference (A B)
  5. Useful Operations

1. Union: P(A U B)

One of the most fundamental operations in probability is the Union. If you’ve taken classes in statistics, mathematics, machine learning, or engineering during high school or college, you’ve most likely encountered this concept.

We pronounce this: A "union" B
We pronounce this: A "union" B

The Union of two sets (or events) is defined as "all elements that are contained in either set A, set B, or both." In simpler terms, it captures everything that belongs to at least one of the sets.

From a logical perspective, the union is connected using the operator "or". If you have a computer science background, this might sound familiar since it aligns with the logical "or" operator used in programming.

1.1. USA Soccer Data: The Union in Action

Imagine you’re a data scientist for the USA National Soccer Team, tasked with analyzing team performance. The head coach approaches you with a straightforward question (probably too simple given your $200K paycheck):

"Can you identify the matches where:

  1. The team lost at home (in U.S. stadiums),
  2. The team was defeated by at least two goals, or
  3. Both (the team lost at home and by at least two goals)?"

As a diligent data scientist, you first break down the problem to calculate and identify these scenarios. While preparing for the 2026 World Cup, it’s also useful to examine related statistics, such as how often the team loses at home or scores at least two goals in a match.

Here’s an example with data from 10 recent national team matches:

  • Matches where the team lost at home: {1, 3, 5, 7}
  • Matches where the team was defeated by at least two goals: {2, 3, 4, 5, 9}

To answer the coach’s question, you calculate the Union of these events, which includes all matches where either condition occurred or both. The result is: {1, 2, 3, 4, 5, 7, 9}

These matches represent opportunities to analyze what went wrong. You hand over the list to the coach, recommending they review these games to identify areas for improvement ahead of the World Cup.

Super simple right? Let’s talk about Intersection now


2. Intersection: P(A∩B)

The intersection of two sets (or events) is defined as "the elements that exist in both set A and set B." In simpler terms, it identifies elements that belong to both sets simultaneously.

We pronounce this: A "intersection" B
We pronounce this: A "intersection" B

While the union of sets is connected using the logical operator "or", the intersection is connected by the logical operator "and." This means an outcome is part of the intersection only if it satisfies both conditions simultaneously.

2.1. USA Soccer Data: The Intersection in Action

Let’s go back to the USA Soccer Data, but this time, the head coach is asking for a more focused analysis:

"Can you identify the matches where:

  1. The team lost at home (in U.S. stadiums), AND
  2. The team was defeated by at least two goals"

While reviewing the Union example, you might have thought, "Wouldn’t it be more insightful to analyze games where we lost by more than two goals instead of just looking at all matches where we lost at home or were defeated heavily?" If so, you were intuitively thinking about the Intersection of these two sets of events – without even realizing it!

Looking back at the data:

  • Matches where the team lost at home: {1, 3, 5, 7}
  • Matches where the team was defeated by at least two goals: {2, 3, 4, 5, 9}

To answer the coach’s question, you calculate the Intersection of these events, which includes only the matches that qualify for both events. The result is: {3, 5}

Focusing on these matches allows the team to identify patterns or weaknesses, such as defensive issues or strategies that didn’t work when playing at home. By analyzing these specific outcomes, the coaching staff can take actionable steps to improve both home performance and defensive strategies in preparation for future games.

This logical "and" helps focus on outcomes that satisfy both criteria, which can be useful for identifying patterns or trends, such as games where the team’s defense performance correlates with losses.


Now that we’ve covered the two fundamental probability concepts – union and intersection – we can dive a bit deeper into other aspects that are crucial for calculating probabilities: Independence and Disjoint

3. Independence and Disjoint

Generally when we talk about probabilties between events, it’s important to clarify whether the events are "independent" or "disjoint." These two concepts can be confusing to differentiate at first because they might seem similar if you’re not paying close attention.

However, they are fundamentally different, and this distinction leads to completely different probability calculations.

  • Disjoint events are mutually exclusive and cannot occur together
  • Independent events do not affect each other, but can occur together.

I’ll explain disjoint more concretely then go into independence.

(Left) A & B are overlapping | (Right) A & B are disjoint
(Left) A & B are overlapping | (Right) A & B are disjoint

3.1. Disjoint: P(A∩B)=0

Disjoint events describe situations where two events can never occur simultaneously, meaning they have no common outcomes. Keep this statement in mind, because it is an extremely important distinction to independence (which I’ll talk about in the next section).

Mathematically, if an event A and event B is disjoint, it is defined as:

Disjoint
Disjoint

In simpler terms, the occurrence of one event guarantees the other cannot happen. Personally I think it’s easier to visualize with an example.

Attending Super Bowl and Vacation Flight to Iceland?

Imagine you’ve planned a vacation for February 9th, 2025. You’ll be flying to Iceland to relax in the hot springs. However, you realize that this date also happens to be the day of Super Bowl LIX between the Eagles and the Chiefs. The flight departure is during the superbowl.

So, let’s me ask you. What is the probability that you’ll be able to make it to both Iceland and the Superbowl?

  • Event A: You are on an airplane flight to Iceland.
  • Event B: You are attending the Super Bowl in person.

Clearly, these two events cannot happen at the same time. If you’re on the flight to Iceland, it’s impossible to be physically present at the Super Bowl, and vice versa. These events are disjoint.

Disjoint events are mutually exclusive – if one happens, the other cannot. Probability of you flying to iceland "P[A]" and attending the superbowl in person "P[B]" is zero!

This is a key concept to understand in probability, as it simplifies calculations by ensuring there is no overlap between the events.

3.2. Independence: P(A∩B) = P(A) ⋅ P(B)

Independent events are events that do not influence each other, meaning the occurrence of one event has no effect on the probability of the other event occurring.

The definition may sound very similar to disjoint, however, unlike disjoint events, independent events can happen simultaneously.

Mathematically, if an event A and event B is independent, it is defined as:

Independence
Independence

This means the probability of both events occurring together is simply the product of their individual probabilities.

The Chiefs Winning the Super Bowl and Vacation Flight delayed?

Let’s look at a similar scenario involving the Super Bowl and your vacation flight to Iceland.

  • Event A: The Kansas City Chiefs win the Super Bowl.
  • Event B: Your vacation flight to Iceland is delayed.

These events are independent because whether or not the Chiefs win has no influence on the likelihood of your flight being delayed. Similarly, the status of your flight has no effect on the outcome of the Super Bowl.

Both events can occur at the same time (the Chiefs win and your flight is delayed), but they are unrelated.

3.3. TL;DR

I’ll quickly give you a summary of the differences again so you can revisit later.

  • Disjoint events are mutually exclusive and cannot occur together (e.g., attending the Super Bowl and flying to Iceland simultaneously).
  • Independent events do not affect each other and can occur together (e.g., the Chiefs winning and your flight being delayed).

4. Complement (Aᶜ) and Difference (AB)

Now that we’ve covered most of the essential foundational concepts in probability, let’s move on to the final two: Complement and Difference.

These are arguably the simplest to visualize and understand, but they’re still incredibly important when working with advanced probability topics.

A quick terminology overview: Universal set (Ω)

Before we move forward to the complement and difference, let’s quickly revisit the concept of the universal set, often denoted as Ω. This is a fundamental idea in set theory and probability, and understanding it will make concepts like complement and difference much clearer.

The universal set is the set that contains all possible outcomes of an experiment or scenario. It represents the "universe" of all elements under consideration.

Every set or event we discuss is a subset of Ω.

Going back to Soccer Matches: If analyzing 10 matches of the USA National Soccer Team, the universal set might represent all matches: Ω={1,2,3,4,5,6,7,8,9,10}

Rolling a Die: For a standard six-sided die, the universal set is: Ω={1,2,3,4,5,6}

The universal set provides a reference point for defining other sets. Without Ω, it’s hard to determine what’s "outside" a given set.

(Left) Aᶜ | (Right) AB
(Left) Aᶜ | (Right) AB

4.1. Complement (Aᶜ)

The complement of a set A includes everything in the universal set (Ω) that is not in A. In simpler terms, it represents all the outcomes that do not belong to A.

Soccer Example (Complement)

  • Our universal set (Ω) are all 10 matches of the soccer team:
  • Our event A was "the team lost at home":

The complement, Aᶜ, represents all the matches where the team did not lose at home. This includes matches played away or matches won at home.

In this example, Aᶜ helps the coach analyze matches where the team performed better at home or away, focusing on successful outcomes.

4.2. Difference (A∖B)

The difference between two sets, A∖B, is the set of all elements in A that are not in B. Think of it as subtracting B from A.

Soccer Example (Difference)

  • Set A: Matches where the team lost at home:
  • Set B: Matches where the team was defeated by at least two goals:

The difference A∖Brepresents all matches where the team lost at home but did not lose by at least two goals. Essentially, this is the subset of A where the team lost by only one goal:

In this case, the coach can focus on matches where the team narrowly lost at home, potentially identifying opportunities to improve performance in close games.

4.3. TL;DR

I’ll quickly give you a summary again so you can revisit later.

  • Complement (Aᶜ) helps identify outcomes that do not belong to a specific event, such as games the team did not lose at home.
  • Difference (A∖B) narrows the focus to specific subsets of outcomes, like home losses where the team lost by only one goal.

5. TL;DR

To finalize this article, I think it’s best to give you guys an overall TL;DR as well as some useful operations that you need to keep in mind when calculating probabilities. First, let me give you a table of the TL;DR.

TL;DR
TL;DR

5.1. Useful operations in probability

A. Commutative: For two events A and B, the order of combining events does not matter.

B. Associative: For three events A, B, and C, grouping of events does not matter.

C. Distributive: Intersection distributes over Union, and vice versa

D. De Morgan’s Law: Complement of unions and intersections

5.2. Useful rules in probability

A. Addition Rule: For the union of two events, it always follows the equation

Union of event A and B
Union of event A and B

However, if A and B are disjoint, then you can cut corners:

B. Multiplication Rule: For the intersection of two events, if A and B are independent, it always follows the equation

Intersection of A and B (Independent)
Intersection of A and B (Independent)

However, if they are not independent (generally the case), you have to use the conditional probability. This is a topic I will go over in the next article!

C. Complement Rule: The complement of an event A is Aᶜ (everything not in A)

Aᶜ
Aᶜ

I hope you were able to learn something!

Connect with me!

If you made it this far, I assume you are an aspiring data scientist, a teacher in the Data Science field, a professional looking to hone your craft, or just an avid learner in a different field! I would love to have a chat with you on anything!

For those wondering about my images: Unless otherwise noted, all images are by the author (myself)

Sunghyun Ahn – Medium


Related Articles