The world’s leading publication for data science, AI, and ML professionals.

Statistics Bootcamp 3: Probably… Probability

Learn the math and methods behind the libraries you use daily as a data scientist

Statistics Bootcamp

Image by Author
Image by Author

To more formally address the need for a statistics lecture series on Medium, I have started to create a series of Statistics Bootcamps, as seen in the title above. These will build on one another and as such will be numbered accordingly. The motivation for doing so is to democratize the knowledge of statistics in a ground up fashion to address the need for more formal statistics training in the data science community. These will begin simple and expand upwards and outwards, with exercises and worked examples along the way. My personal philosophy when it comes to engineering, coding, and statistics is that if you understand the math and the methods the abstraction now seen using a multitude of libraries falls away and allows you to be a producer, not only a consumer of information. Many facets of these will be a review for many learners/readers, however having a comprehensive understanding and a resource to refer to is important. Happy reading/learning!

In our third bootcamp, we will be introducing Probability.

Would you feel safer flying or driving across the U.S.? How much greater is the risk of one way of travel versus the other? These are both questions that can be answered through the use of probabilities.

Probability

A probability experiment is a chance process that leads to well-defined results, called outcomes. An outcome is the result of a SINGLE TRIAL of a probability experiment. An event is some specified outcomes that may or may not occur when a probability experiment is performed (can have multiple outcomes – boy or girl). We can define possible events using a sample space. A sample space lists the set of all possible outcomes of a probability experiment. Probability is the chance of a particular event occurring and is the basis of inferential Statistics.

Examples:

| Experiment             | Sample Space                    |
| ---------------------- | ------------------------------- |
| Coin toss              | Heads, tails                    |
| Roll a die             | 1,2,3,4,5,6                     |
| Sex of three people    | {F,F,F},{M,M,M},{M,M,F},{F,F,M} |
| Superbowl winner       | Names of 32 teams               |

Properties of Probabilities

Property 1: The probability of an event is always between 0 and 1, inclusive. Property 2: The probability of an event that cannot occur is 0. An event that cannot occur is called an impossible event. Property 3: The probability of an event that must occur is 1. An event that must occur is called a certain event.Property 4: The sum of probabilities of all possible outcomes in the sample space is 1.

How do we count?

This might seem like a silly subtitle, but is important to cover formally when we think about probability. We utilize counting rules to quantify the number of ways an event (outcome) can happen. We can define it using the equation:

This gives rise to a probability distribution.

Example. Find the sample space for the sex of 3 children in a family, if we are interested in knowing specifically the sex of the 1st, 2nd and 3rd children (i.e. order matters). We are going to use a tree diagram. A tree diagram is a schematic with branches emanating from a starting point showing all possible outcomes of a probability experiment.

When we roll a die twice, 36 equally likely outcomes are possible. Note that here the order of the outcome matters. When determining probability that the sum is 11 (denoted as: P(sum=11), this can be arrived at two ways, 5+6 and 6+5. P(sum=11)=(ways in which 11 can happen)/N=2/36=0.056= 5.6%

Image by author
Image by author

When we roll two dice at once, what is the sample space?

Image by author
Image by author

The sample space is 21. In this case, (5,6) is the same as (6,5), so it is only counted once. P(sum=1) = 1/21. So notice how the sample space differs when order matters and when it doesn’t, 36 versus 21.

Counting Rules

The fundamental counting rule is to find the total number of outcomes in a sequence of events, multiply the number of outcomes from each event.

Example. Two exercise programs and three diet plans for subjects with diabetes. exercise programs = 2 outcomes diet plans = 3 outcomes total number of possible strategies = 2*3 = 6

The fundamental counting rule applies when repetitions are permitted, i.e. the number of outcomes per event does not change.

Example. The numbers 1–9 are to be used in a 6 digit student ID card. How many unique cards are possible if repetitions of the same individual number are permitted? 99999*9 = 9⁶=531441 Where each ‘9’ is the number of options (digits) we have to pick from.

If repetitions are not allowed, then the number of outcomes is reduced by 1 per event. Thus, if we redid the example above, but repetition was not allowed:

Example. Repetitions not permitted: 98765*4 =60480

Let’s cover some formulas.

Factorial formula:

Permutations

Permutation is an arrangement of objects in a specific order. The permutation rule is the arrangement of objects in a specific order using r objects at a time is called a permutation of n objects taking r objects at a time. Think – ‘n Permute r’, it is written as:

Permutation Examples

Example. If 3 ladies (Jackie, Maria and Jane) entered a race, how many different finishing orders are possible?

6 different finishing orders are possible.

Example. 5 people entered the previous race (Jackie, Maria, Carolyn, Carly, Jane) but only the first two get prizes and the prizes are different. How many different orderings of those first two options are possible?

20 different options are possible for the first two prizes.

Combinations

Combination is a selection of distinct objects regardless of order. The combination rule is the number of ways in which r distinct objects can be selected from n objects. Think – ‘n Choose r’.

Combination Examples

Example. 7 female and 5 male graduate student in sociology. How many ways can a committee of 4 people be selected?

How many ways can this committee be selected if there must be 2 men and 2 women on the committee?

How many ways can this committee be selected if there must be at least 2 female students on the committee?

_Possible ways to construct the committee:

  • exactly 2 female (210 ways)
  • exactly 3 female (175 ways)
  • exactly 4 female (35 ways) total: 420 ways_

Complementation Rule

The complementation rule can be formalized as follows: The probability of the event, E, is 100% less the value of the probability of the event not occurring.

In other words, you either get the event or you don’t get the event. The complement of E is denoted as such:

Example. If there is a 60% chance (probability) the White Sox will win the world series, there is a 40% chance they won’t.

Relationships Among Events

(not E): The event "E does not occur" (A & B): The event "A and B both occur" (A or B): The event "either A or B occur"

The probability of our sample space must always sum to 1, P(S)=1.

Image by author
Image by author

Mutually Exclusive Events

Events are said to be mutually exclusive, if they cannot occur at the same time. This means they have no outcomes in common. P(A & B)=0.

Image by author
Image by author

If you were asked the event of getting a queen or a heart when drawing a single card from the deck, what would you say if asked this constitutes a mutually exclusive event? You would say "No". What if you were asked the event of getting a club or a heart when drawing a single card from the deck. Are these mutually exclusive? Yes they are!

Probability Calculation of Events

We will cover the rules and types of probabilities when calculating events in this article. I will cover Bayes, with respect to calculation of events, in the subsequent bootcamp.

Addition Rule

The general addition rule states that if A and B are any two events, then:

P(A or B) = P(A) + P(B) – P(A and B)

If A and B are mutually exclusive then we apply the special addition rule:

P(A or B) = P(A) + P(B)

Written more generally:

P(A or B or C or …) = P(A) + P(B) + P(C) + ..

Let’s take a look at if we were to use the general rule of addition when events are mutually exclusive.

Example. What if you were asked the probability of getting a club or a heart when drawing a single card from the deck, what would you say? P(club or heart) = P(club) + P(heart) – P(club and heart) = 13/52 + 13/52 – 0 = 2/4 = 0.5 = 50%

We get a zero for the intersection P(club and heart). What happens if you have 3 events that are not mutually exclusive? You can use the Venn diagram to help answer this question.

Image by author
Image by author

P(A or B or C) = P(A) + P(B) + P(C) – P(A and B) – P(B and C) – P(A and C) – P(A and B and C)

Example. At a particular school with 200 females students, 58 play volleyball, 40 play basketball, and 8 play both. What is the probability that a randomly selected female student plays neither sport? V – volleyball B – basketball Ascertain what the probabilities we’ve been provided with. Then, to find the complement

A contingency plan is a nice thing to have in the event of an apocalypse, but not be confused with a contingency table. Unless your contingency plan for knowing you did your stats homework right is a contingency table. 😉

A contingency table illustrates the frequency distribution for more than one categorical variable. Let’s say we have a soccer player, Waleed, who suffered an injury. He subsequently obtained and read information on pain medications, specifically he stumbled on a study comparing two different medications and their side effects, seen here in the contingency table below.

Marginal Probability

Suppose a patient in this study was randomly selected. What’s the probability that we was on drug A – P(A)? The marginal probability is the probability of the occurrence of the single event. To gain some intuition, you can think about it as ‘the base rate/prevalence of drug A is…’. You can see the marginal probabilities of side effects are on the right, and placebo, drug A (=181/429) and drug B are along the bottom.

Joint Probability

Joint probability is the probability of more than one event occurring at the same time (concurrently).

Example. For a randomly selected patient in this study, what is the chance of having sinus headache and being on drug B P(Sinus and drug B)?

The same table show previously we can arrive at our answer by following the the specific row and column to arrive at an answer of 0.0746 = 7.46%

Conditional Probability

Conditional probability is the probability that event H (the patient had a headache) occurs, provided that event A (the patient was on drug A) has occurred. We denote this using the bar notation. P(H|A), where ‘|’ can be interpreted as ‘given that’.

Example. For a randomly selected patient using drug A, what is the chance they suffer from a sinus headache?

So the probability given that a random patient (‘|’) has a headache if they are prescribed drug A: P(H|A) = 25/181 = 0.1381.

Conditional probability is then defined as:

You’ve likely heard, and used, the terms sensitivity and specificity in data science, stats articles and the news. It is important to recognize these are both examples of conditional probabilities in our everyday life. 🙂

Sensitivity: P(test is positive|disease is present) Specificity : P(test is negative|disease is absent)

Multiplication Rules

The general multiplication rule is the probability of two events occurring (joint probability) can be written with some mathematical intuition as:

P(A and B) = P(A)P(B|A) = P(B)P(A|B) note: P(A and B) is the same as P(B and A)

The joint probability of three events occurring is

P(A and B and C) = P(A|(B and C))P(B and C) note: P(B and C) = P(B|C)P(C) P(A and B and C) = P(A|(B and C))P(B|C)P(C) note: These rules apply to any events. No assumptions are required for these rules.

Let’s think about this in the context of medicine. Your patient has a cough that has lasted 3 weeks and a rash that started, 1 week ago, what is the diagnosis? It is always important to consider the simple or singular explanation of two co-occurring events is far less likely than two different reasons. Playing off Occams Razor, simple theories are preferred to more complex ones.

Independent Events

Two events, A and B are independent if the fact that A occurs does not affect the probability of B occurring, i.e., P(B|A)=P(B).

Examples of independent events:

  1. toss a coin several times
  2. disease status in unrelated people (non-infectious) 😉
  3. test results from studies conducted in different states

Examples of dependent events:

  1. getting injured in the same hockey team
  2. gene sequencing from individuals in the same family

Multiplication Rule for Independent Events

For sake of review, the multiplication rule is used to find the probability of two independent events that occur in sequence. We’ll step through it again:

  1. Find the probability of each event occurring separately
  2. Multiply the answers – because of the fundamental counting rule
  3. P(A and B) = P(A)P(B|A), if independent, then P(B|A)=P(B), therefore, P(A and B) = P(A)P(B)

Example. A coin is tossed twice, the probability of getting heads on the first and second tosses are: 1/2 * 1/2 = 1/4 HH, TT, HT, TH, 4 possible outcomes So, P(HH)=1/4=0.25

Example. U.S. Data and Statistics reports that 22% of physiotherapists are men. If 3 physiotherapists (A, B, C) are randomly selected, find the probability that they are all female. P(a randomly selected physiotherapist is female) = 1 – P(Physiotherapist is male)= 1 – 0.22 = 0.78 = 78% P(A is female and B is female and C is female) = P(A)P(B)P(C) = 0.780.780.78 = 0.47 = 47%

Exhaustive Events

Events are exhaustive if one or more events in the same space must occur. Suppose events in the sample space are both exhaustive and mutually exclusive, exactly one event must occur.

  1. E1, E2, and E3 do not overlap (mutually exclusive).
  2. Fill out entire sample space, S (exhaustive).

Rule of total probability

What if we introduce another event in the same sample space?

P(B)=total region within the ellipse P(E1&B), P(E2&B), P(E3&B) are mutually exclusive so: P(B) = P(E1&B)+ P(E2&B) + P(E3&B), by addition rule = P(E1)P(B|E1)+ P(E2)P(B|E2) + P(E3)*P(B|E3), by general multiplication rule

Image by author
Image by author

Rule of total probability states that if events E1, E2… Ek are mutually exclusive and exhaustive, then for any event B in the same sample space:

Example. A group of individuals aged 50 were enrolled in a study. Each subject was classified as low-risk, medium-risk or high-risk of stroke. Of these, 60% are low-risk, 30% are medium-risk, and 10% are high-risk. After a 5-year followup, the study found that 1% of the low-risk subjects suffered a stroke, and 5% of the medium risk and 9% of the high risk. If an individual aged 50 is selected at random, find the probability that he/she will have a stroke. Let’s use our handy tree diagram from earlier.

Image by author
Image by author

Let’s build out the math now: P(low)=0.6, P(medium)=0.3, P(high)=0.1 P(stroke|low)=0.01, P(stroke|medium)=0.05, P(stroke|high)=0.09 To calculate the risk of stroke, regardless of risk, we add probabilities of the union (stroke and risk level) of the 3 groups together: P(stroke) = P(low and stroke) + P(medium and stroke) + P(high and stroke) to actually get the probability of the union we will use the general multiplication rule: P(A and B) = P(A)P(B|A) P(stroke) = P(low)P(stroke|low) + P(medium)P(stroke|medium) + P(high)P(stroke|high) = (0.60.01) + (0.30.05) + (0.1*0.09) = 0.03 or 3% Therefore, the probability a randomly selected individual, aged 50 will have a stroke is 3%. We can refer to this as the base rate.

Wrap-Up

In this bootcamp, we have introduced the concept of probability and the formalism of the notation. You’ve learned how to ‘count’ 😉 in a probabalistic way and the difference between marginal, joint and conditional probabilities. We’ve worked through some examples of how to calculate these outcomes, supplemented with visualizations to facilitate understanding.

Previous boot camps in the series:

#1 Laying the Foundations #2 Center, Variation and Position

All images unless otherwise stated are created by the author.


Additionally, if you like seeing articles like this and want unlimited access to my articles and all those supplied by Medium, consider signing up using my referral link below. Membership is $5(USD)/month; I make a small commission that in turn helps to fuel more content and articles!

Join Medium with my referral link – Adrienne Kline


Related Articles