The world’s leading publication for data science, AI, and ML professionals.

Statistics Bootcamp 8: A Tale of Two Means

Learn the math and methods behind the libraries you use daily as a data scientist

Statistics Bootcamp

Image by Author
Image by Author

This article is part of a larger Bootcamp series (see kicker for full list!). This one is dedicated to learning how to compare two populations, building on our knowledge of a sample mean compared to the population.

We may want to investigate if there is a difference on some facet between two populations (or two samples). For example, is there a difference in the age of those attending medical school at Northwestern University or University of Chicago? Here we seek to compare the two means, rather than in the previous bootcamp, where we tried to see if there existed a difference in the mean of a sample to a mean relative to the population.

We would outline our question regarding age and medical students as such:

  1. population 1 = all medical students enrolled at UIC population 1 = all medical students enrolled at Northwestern

  2. Hypotheses (in the case of a two-tailed test):

Comparing Two Populations

If we wanted to compare two populations, building on what we learned in the previous bootcamp, we return to our beloved z-test. Our assumptions for this test are that the populations are independent of one another, standard deviations are known for both, the samples are either normally distributed or sample size is ≥ 30 and are random samples.

We can formulate our hypothesis comparison as follows:

With two independent populations, their respective means are xbar_1 and xbar_2, and have associated variances of σ² and σ²₂.

Mean difference:

Variance:

Standard deviation:

Confidence Interval for Difference of Two Population Means

Expanding on the calculating confidence intervals (CIs) for a single variable (bootcamp 6), we take the difference of the means (estimate) and add the standard error from each population together:

Example. The mean weight of a sample of USC Trojans football players is 120 kg, with a population variance of 49 kg. The mean weight of 100 fighting Irish from Notre Dame is 99 kg with population variance of 50 kg. Calculate the 95th percentile confidence interval for the difference of the means.

Two Sample z-test

We can test a hypothesis comparing two population means using a two sample z-test. Our assumptions include: 1) populations are independent of one another, 2) the populations are large or normally distributed, 3) simple random sample (SRS) and 4) standard deviations of each population are known. To carry this out, we will follow the steps below:

  1. Write out our hypothesis statement, comparing the means. H₀: μ1 = μ2 where our alternative hypothesis H₁ can be: μ1 ≠ μ2 (two-tailed), μ1 < μ2 (left-tailed) or μ1 > μ2 (right-tailed).
  2. Set significance level, alpha.
  3. Calculate z-statistic:

To create some intuition about this equation, think of the numerator (x_1 – x_2) as the observed difference in the populations versus the expected (μ_1 – μ_2) which we denote with 0, i.e. no difference (null hypothesis).

Example. Suppose we are deciding whether to take a vacation to Disneyland or Disney World. And we are tracking the flight prices of each to decide where to go. There’s a sale on, you can get a round trip ticket to Disneyland for $88.42 and $80.61 to Disney World. Assume these numbers were generated from two samples of 50 flight options with standard deviation $5.62 and $4.83. At α = 0.05, can you conclude there is a difference in the cost?

  1. State hypotheses: H₀: μ1 = μ2 H₁: μ1 ≠ μ2
  2. α = 0.05
  3. Our corresponding critical value for α = 0.05 is ± 1.96
  4. Our calculated z-score is:
  1. Compare calculated z value to z critical: 7.45 > 1.96
  2. Based on the comparison, we would reject the null hypothesis
  3. Interpret our findings. There is sufficient evidence to suggest the means are not equal, and there exists a difference in the rates between Disneyland and Disney World.

Two Sample t-test

So far, you have learned that we can employ a z-test to investigate the difference between two population means when each population’s variance is known. When we do not have information regarding the variances of the population, we return to a t-test (also called a student’s t-test) to compare two means.

There are two categories of t-tests: non-paired and paired. When you see paired versus non-paired, equate this independent and dependence respectively when dealing with the populations. A paired test would be used when we examine the same population before and after for changes, or paired samples in the case of randomized control trials (RCTs). Independent populations are just that – they are being examined on a facet for differences, but the cohorts are otherwise established independently of one another. A non-paired t-test (just referred to simply as t-test) and can be further divided into 2 subcategories: pooled t-test and non-pooled t-test.

A pooled t-test is used when we have EQUAL variances, or are assumed as such. A non-pooled t-test is used when the variances are UNEQUAL or are assumed as such. We can check (informally) using the squared ratio of the larger (s) and smaller (s) standard deviations.

If:

sₗ² / sₛ² < 4 or sₗ / s < 2 : use a pooled t-test

otherwise: use a non-pooled t-test

Pooled t-test

When performing a pooled t-test to compare two population means, we will have our usual assumptions: simple random sample, normal/large population, independently established populations/samples. But we will also add the assumption that the standard deviations of the populations are equal.

  1. State our alternative and null hypotheses: H₀ (μ₁ = μ₂) and H₁
  2. Determine from our hypotheses if the test constitutes a right (μ₁ > μ₂), left (μ₁ < μ₂) or two-tailed test (μ₁ ≠ μ₂)
  3. Decide on a significance level α
  4. Compute test statistic:

where sp is:

And n1 and n2 are the number of entities in each sample. Note that sp is a ‘pooled’ standard deviation, and what makes this test a pooled t-test. Calculating sp we are creating a weighted estimate of σ (sp) according to the respective sample sizes. We can proceed from here using either a critical value or p-value approach:

  1. The critical values are: ±t_{α/2} (two tailed), -t_{α} (left tailed) or t_{α} (right tailed) with DOF = n1 + n2–2 = (n1–1) + (n2–1). Using a t-table, these can be ascertained. The same t-table can create an estimate of the p-value.
  2. If the test statistic falls in the rejection region OR p-value ≤ α, we reject H₀, otherwise we fail to reject H₀.
  3. Interpret and summarize the results of the statistical test. Always noting that a hypothesis test is exact for normal populations and approximate for non-normal but large populations.

Example. A football coach wants to determine if electrolyte fluid consumption is different between his football team and the track varsity teams in Liters/week (L/week). A random sample of 10 atheletes were selected from each team

track:
1.65
3.24
1.87
1.64
2.35
3.39
0.56
1.93
2.21
football:
2.23
2.78
1.89
3.67
1.45
1.28
3.57
2.63
2.78
  1. H₀: μ₁ = μ₂ and H₁: μ₁ ≠ μ₂ (two-tailed), population 1: track, population2: football
  2. α = 0.05
  3. sample means and standard deviations:
  1. Pooled variance sp:
  1. Test statistic:

6. DOF: 10 + 10 -2 = 18, tcrit: -2.101. Since -1.89 is not less than -2.101, we fail to reject Ho.

7. We fail to reject H₀ as there is insufficient evidence to suggest the consumption of electrolyte fluids between the two populations is different.

CI for Two Means When σ is Equal

Assumptions when calculating a confidence interval between two independent sample means when σ is equal include: normal/large populations, equal population σ’s (standard deviations), SRS and independent populations.

  1. Select a confidence level based on α, CI = 1- α
  2. Ascertain DOF = n1 + n2 – 2
  3. Using a t-Table, find ±t_{α/2} using the DOF
  4. The range of the CI is determined by the formula, written in the form (lower bound, upper bound):
  1. We can now interpret and make recommendations based on the CI.

Non-pooled t-test

A non-pooled t-test runs nearly identically to that of a pooled t-test. However, where it differs is in the assumption that the standard deviations of the populations are unequal. This ultimately gives rise to differences in the specific calculation of the test statistic t and the DOF – often symbolized using the delta (Δ) **** notation. Note, that we need to round this down to the nearest integer if a floating point value is returned. We follow the same steps 1–3 as outlined above in the pooled t-test, so we will jump to step 4:

  1. Determine DOF (Δ):
  1. Determine the t-stat:
  1. Compare t stat to the critical value or p-values using a t-table, as per the hypothesis test: ±t_{α/2} (two tailed), -t_{α} (left tailed) or t_{α} (right tailed)
  2. If P ≤ α or if test statistic falls in rejection region, reject H0, otherwise fail to reject H0. We can now interpret out results and make recommendations based on our data.

CI for Two Means When σ is Unequal

Assumptions when calculating a confidence interval between two independent sample means when σ is unequal include: normal/large populations, unequal population σ’s (standard deviations), SRS and independent populations.

  1. Select a confidence level based on α, CI = 1- α
  2. Ascertain DOF (remember to round down to nearest whole integer):
  1. Using a t-Table, find ±t_{α/2} using the DOF
  2. The range of the CI is determined by the formula, written in the form (lower bound, upper bound):
  1. We can now interpret and make recommendations based on the CI.

Example. A high school track and field coach wants to see if spikes on the shoes of his short distance runners improves their mean times (decreases). At a significance level of 0.01, is there sufficient evidence to conclude that the mean 100m times of his athletes has decreased using spikes relative to those that don’t?

spikes
11.2
12.1
10.9
11.2
12.1
10.9
11.8
12.1
10.9
11.7
12.5
10.9
12.1
10.6
11.1

no spikes
13.6
14.1
12.3
11.5
15.8
13.6
14.1
12.3
10.5
15.8
13.6
16.2
12.7
10.8
15.8
  1. Compute the mean and standard deviation for each of the samples:

_2. s_l/ss = ~3, which is > 2, therefore we can assume unequal variances and we move to perform a non-pooled t-test.

3. H₀: μ₁ = μ₂ and H₁: μ₁ < μ₂ (left-tailed), population 1: spikes, population2: no spikes

4. α = 0.01

5. Compute the t statistic:

  1. DOF:
  1. The critical value for t{df,α} = t{18,0.01} = 2.552. We need to convert this to a negative value (-2.552) since a standard t-table doesn’t always provide directionality. Comparing this to our calculated value of -4.20 we can see it falls in the rejection region of our hypothesis test.
  2. We reject H₀ and conclude that there is sufficient evidence in the sample data to indicate that 100m sprinters using spikes on their shoes decreases their time.

Paired Sample Testing

In paired hypothesis testing, we want to control for the effect of individual differences on the outcome measures. This oftentimes looks like measuring an outcome for an individual, performing an intervention, and then re-measuring an outcome for an individual. Except, we expand on this and do this for many people on aggregate to see if there is an overall benefit. For example, do students do better on their math test when they’ve slept 5 hours versus 8 hours? Generalizing this, we would call this a repeated measure, matched design. Randomized control trials (RCTs) try and do the same thing, except they have to find patients who are as similar to someone who received the therapy (or lack thereof) and someone who didn’t. Both of these examples no longer represent independent samples from two populations.

Paired t-Test

A paired t-test is used when we want to compare 2 means in a paired sample. The assumptions are the same as for our independent t-test – samples are obtained randomly, and they are either large samples or normally distributed. However, we are no longer making the assumption they are INDEPENDENT of one another.

Our assumptions are as follows: simple random sample (SRS) and a normal/large population/sample.

  1. State our alternative and null hypotheses: H₀ (μ₁ = μ₂) and H₁
  2. Determine from our hypotheses if the test constitutes a right (μ₁ > μ₂), left (μ₁ < μ₂) or two-tailed test (μ₁ ≠ μ₂)
  3. Decide on a significance level α
  4. Calculate the differences (e.g. before vs. after) from each instance in the sample
  5. Compute test statistic:

where d_bar is the mean of the differences and s_d is the standard deviation in the differences and n is the the number in the sample.

  1. The critical values are: ±t_{α/2} (two tailed), -t_{α} (left tailed) or t_{α} (right tailed) with DOF = n-1. Using a t-table, these can be ascertained. The same t-table can create an estimate of the p-value.
  2. If the test statistic falls in the rejection region OR p-value ≤ α, we reject H₀, otherwise we fail to reject H₀.
  3. Interpret and summarize the results of the statistical test. Always noting that a hypothesis test is exact for normal populations and approximate for non-normal but large populations.

Example. A physician claims that a particular drug assists overweight patients in shedding a few pounds. A sample of 10 patients were weighed before they started the treatment and 1 month later. Do the data support the physician’s claim at a p-value of 0.05?

  1. State the hypotheses: H₀: μ2 = μ1 H₁: μ2 < μ1. This constitutes a left-tiled test
  2. alpha = 0.05
  3. Determine the average difference in the paired data (after -before)
  1. Determine the standard deviation in the paired difference (n=10 here)
  1. Calculating the t-stat we get:
  1. Using our t-table, we need to determine our critical values based on our degrees of freedom (DOF = n-1) and alpha = 0.05. Here, our DOF = 10–1 = 9. t_crit = -1.833
  2. Because -3.06 < -1.833, we reject H₀
  3. Our results are that we possess sufficient evidence to support the claim that the weight loss drug causes a statistically significant net weight decrease.

Paired t-interval Calculation

If we wished to determine a confidence interval (CI) in paired sample means, µ1 and µ2. Remember that a confidence interval calculation is only approximate for large samples with differences that are not normally distributed. Our assumptions for calculating a t-interval calculation are having a large sample/differences that are normally distributed and having a simple random sample (SRS).

  1. Determine confidence level alpha
  2. Determine DOF=n-1
  3. Use a t-Table to find t_{α/2}
  4. The boundaries of the CI for µ1 – µ2 is calculated as:
  1. We can now interpret our confidence interval and make inferences from this.

Summary

In this bootcamp we’ve covered t-testing extensively. Specifically, when to use paired and non-paired t-tests. After reading this article, you should recognize scenarios when data violates the assumption of independence and for those that retain their independence how to perform the necessary calculations. We also explored the subcategories within independent t-tests – i.e. pooled and non-pooled, so always remember to check your standard deviations. Whenever we make inferences from data it is ALWAYS important to acknowledge and assumptions we made in performing the analysis.

All images unless otherwise stated are created by the author.


Additionally, if you like seeing articles like this and want unlimited access to my articles and all those supplied by Medium, consider signing up using my referral link below. Membership is $5(USD)/month; I make a small commission that in turn helps to fuel more content and articles!

Join Medium with my referral link – Adrienne Kline


Related Articles