Making Inferences about a Single Population Variance

Hypothesis Tests and Confidence Intervals of the Variance Using the Chi-Square Distribution

Joshua Kartzman
Towards Data Science

--

Variance is a crucial component when analyzing risk and uncertainty. The unbiased estimator of the population variance, the sample variance, as represented by the following formula:

Formula for Sample Variance (Image by Author)

is the core statistic, where n is the sample size and Y-bar is the sample mean.

The following formula

Test Statistic (Image by Author)

has a central chi-square distribution with degrees of freedom n-1, represented as

Chi-Square with n-1 degrees of freedom (Image by Author)

The chi-square distribution, therefore, can be used to make inferences about the population variance, σ², using the sample variance S². This, as shown in the following examples, can be a powerful tool when performing hypothesis tests or creating confidence intervals of the population variance.

Example 1: Right-Tailed Hypothesis Test of Population Variance

A research team collected a sample of 10 observations from the random variable Y, which had a normal distribution N(μ,σ²). They found that Y-bar=57.9, where Y-bar is the mean of the 10 observations, and S²=485.2, where S² is the sample variance. Test the null hypothesis H0: σ²≤375 against the alternative hypothesis H1: σ²>375 at the 0.10, 0.05, and 0.01 levels of significance.

Solution:

The test statistic, (n-1)S²/σ², has a chi-square distribution with 10–1=9 degrees of freedom, which can be written as

Chi-Square with 9 degrees of freedom (Image by Author)

The problem indicates that a right-tailed hypothesis test should be used since the alternative hypothesis uses a greater-than sign. The critical values for each level of significance are 14.68 (for level of significance 0.10), 16.92 (for level of significance 0.05), and 21.67 (for level of significance 0.01). These can be verified using R as shown below:

Calculating Critical Values (Image by Author)

The value of the chi-square test statistic under the null hypothesis is

Calculating Test Statistic (Image by Author)

One should note that 375 is the value of the population variance assuming the null hypothesis is true and that 485.2 is the value of the sample variance calculated from the 10 observations. The test statistic is less than all of the critical values. As a result, we fail to reject the null hypothesis at significance levels 0.10, 0.05, and 0.01. One should also note that the sample mean was not required to perform the hypothesis test of the population variance.

Example 2: Left-Tailed Hypothesis Test of Population Variance

A research team collected a sample of 15 observations from the random variable Y, which had a normal distribution N(μ,σ²). They found that Y-bar=802.3 and S²=9.2. Test the null hypothesis H0: σ²≥25 against the alternative hypothesis H1: σ²<25 at the 0.10, 0.05, and 0.01 levels of significance.

Solution:

The test statistic, (n-1)*S²/σ², has a chi-square distribution with 15–1=14 degrees of freedom, which can be written as

Chi-Square with 14 degrees of freedom (Image by Author)

The problem implies that a left-tailed hypothesis test is appropriate since the alternative hypothesis uses a less-than sign. The critical values for each level of significance are 7.79 (for level of significance 0.10), 6.57(for level of significance 0.05), and 4.66(for level of significance 0.01), as seen below:

Calculating Critical Values (Image by Author)

Since this is a left-tailed test, the test statistic must be less than the critical value in order to reject the null hypothesis. The value of the chi-square test statistic under the null hypothesis is

Calculating Test Statistic (Image by Author)

Since the test statistic is less than the critical value at the 0.10 significance level, 7.79, we can reject the null hypothesis for the significance level 0.10. Likewise, the test statistic is less than the critical value at the 0.05 significance level, 6.57, so we can similarly reject the null hypothesis. However, since the test statistic is greater than the critical values at the 0.01 significance level, 4.66, we fail to reject the null hypothesis at that significance level. Once again, notice that the sample mean was not used to perform the hypothesis test.

Example 3: Two-Tailed Hypothesis Test of Population Variance

A research team collected a sample of 18 observations from the random variable Y, which had a normal distribution N(μ,σ²). They found that Y-bar=31.8 and S²=19.2. Test the null hypothesis H0: σ²=10 against the alternative hypothesis H1: σ²≠10 at the 0.10, 0.05, and 0.01 levels of significance.

Solution:

Two-tailed hypothesis tests for single population variances are different from single-tailed hypothesis tests in that they require two critical values to be evaluated against the test statistic, where each critical value is the chi-square statistic at the level of significance divided by 2. The test statistic, (n-1)S²/σ², has a chi-square distribution with 18–1=17 degrees of freedom, which can be written as

Chi-Square with 17 degrees of freedom (Image by Author)

The problem implies that a two-tailed hypothesis test is appropriate since the alternative hypothesis used ≠. As mentioned before, two critical values have to be calculated, one for the upper tail

Upper Critical Value (Image by Author)

and one for the lower tail of the distribution

Lower Critical Value (Image by Author)

Once again, the area of each of these tails is the level of significance divided by 2. This is done since the total area of the two equals their sum, which is the level of significance. The critical values for the 0.1 level of significance are

Critical Values for 0.1 Level of Significance (Image by Author)

The critical values for the 0.05 level of significance are

Critical Values for 0.05 Level of Significance (Image by Author)

The critical values for the 0.01 level of significance are

Critical Values for 0.01 Level of Significance (Image by Author)

These critical values can be calculated using R as shown below:

Calculating Critical Values (Image by Author)

In order to reject the null hypothesis, the test statistic must be either greater than the upper critical value,

Upper Critical Value (Image by Author)

or less than the lower critical value,

Lower Critical Value (Image by Author)

If not, then we fail to reject the null hypothesis. The value of the chi-square test statistic under the null hypothesis is

Calculating Test Statistic (Image by Author)

At the 0.1 and 0.05 significance levels, the test statistic is greater than the upper critical values, 27.59 and 30.19, indicating that the null hypothesis can be rejected at those significance levels. However, the test statistic is between the upper and lower critical values at the 0.01 significance level. As a result, there is insufficient evidence to reject the null hypothesis at the 0.01 significance level.

Example 4: 99% Confidence Interval for Population Variance

A research team took a sample of 21 observations from the random variable Y, which had a normal distribution N(μ,σ²). They found that Y-bar=178.2 and S²=98.2. Find the 99% confidence interval for σ².

Solution:

Since the sample variance has 21–1=20 degrees of freedom, the probability that the test statistic 20*S²/σ² is greater than

Chi-Square Statistic with 0.005 Left Tail Area (Image by Author)

and less than

Chi-Square Statistic with 0.995 Right Tail Area (Image by Author)

is 0.99. Written more concisely:

Interval for Test Statistic (Image by Author)

This can be used to derive the confidence interval for the population variance:

Deriving Confidence Interval (Image by Author)

The confidence interval, as can be seen above, was found to be (49.104, 264.199). Once again, the sample mean is not needed to calculate the solution.

I hope you enjoyed this article. This is my first time writing for TDS, and there were definitely some bumps along the way. If I made any mistakes or arithmetic errors please let me know and I’ll correct them. If all goes well, I plan on writing an article on making inferences about two population variances using the F-distribution.

--

--