Hypothesis Testing — 2-tailed test

Tanwir Khan
Towards Data Science
6 min readNov 27, 2019

--

In this post, we will discuss how to do hypothesis testing for a 2-tailed test. I have discussed in detail with examples about hypothesis testing and how to validate it using the Null(H0) and Alternate(H1) hypothesis in my previous post. So, in this post, I won’t be going into the what and how of hypothesis testing. Rather we will directly see how a 2-tailed test is carried out, what are the conditions and criteria to implement this approach.

So, let’s dive right into a question and see how to solve it. During the course of solving the question, I will explain the necessary concepts to make the approach more clear for you.

Problem Statement: The average height of students in a batch is 100 cm and the standard deviation is 15. However, Tedd believes that this has changed, so he decides to test the height of 75 random students in the batch. The average height of the sample comes out to be 105. Is there enough evidence to suggest that the average height has changed?

Let’s reiterate back to the steps for performing hypothesis testing:

  1. Specify the Null(H0) and Alternate(H1) hypothesis
  2. Choose the level of Significance(α)
  3. Find Critical Values
  4. Find the test statistic
  5. Draw your conclusion

So let’s perform the step -1 of hypothesis testing which is:

  1. Specify the Null(H0) and Alternate(H1) hypothesis

Null hypothesis (H0): The null hypothesis here is what currently stated to be true about the population. In our case it will be the average height of students in the batch is 100.

H0 : μ = 100

Alternate hypothesis (H1): The alternate hypothesis is always what is being claimed. “In our case, Tedd believes(Claims) that the actual value has changed”. He doesn’t know whether the average has gone up or down, but he believes that it has changed and is not 100 anymore.

H1: μ ≠100

Always remember that an alternate hypothesis is always written with a ≠ or < or > sign. Please refer the below table for more clarity.

So if the alternate hypothesis is written with a ≠ sign that means that we are going to perform a 2-tailed test because chances are it could be more than 100 or less than 100 which makes it 2-tailed.

So, after stating the Null and Alternative hypothesis, it’s time to move to step-2 which is:

2. Choose the level of Significance(α)

Level of Significance is basically defined as the area in the tails of the curve. Generally, level of significance is provided, but if it is not then we need to choose the level of significance.

So, if the level of significance is not provided then we take it as 0.05 as it is the most common value. let’s see how that is represented in a 2-tailed test

In the above curve, you could see that the level of significance is 0.05, and the two tails are symmetrical which means that they have the same area. This means that each tail has an area of 0.025.

Last thing that we need to see here in the curve is, that the total area of the curve is 1 or 100%. Since the total areas of the tails equal 0.05, then the area of the middle of the curve will be 95% or 0.95.

We have now stated our level of significance. Now let’s move on to step-3

3. Find Critical Values

Critical Values are basically the z-value or t-value which separates the area shaded in red(area in the tail) and the middle area of the curve.

The critical value here could either be a z-value or a t-value. Let’s see what will that be in our example.

Keep in mind that we will use the z-value when the population standard deviation(σ)is provided to us.

We will use the t-value when:

  1. The population standard deviation(σ) is not given in the problem statement.
  2. The sample size taken or provided is less than 30.

In our case, we will use the z-value as the population standard deviation is provided. To calculate the z-value we will use the z-table given below.

We could basically use a couple of approaches here to calculate the z-value.

  1. As we know that the area of the curve excluding the tails is 0.95 or 95%, that means the confidence level of the statement is 95%. So we could use the confidence level to look up the z-value.
  2. We can also use the area in one of the curves which is 0.025 to look up the z-value.

One thing that you have to notice in the above table is the column “Area between 0 and z-score” is nothing but one-half of the confidence level (0.4750 in our case). But suppose we have a confidence level which is not provided in the above table, then you need to divide the confidence level by 2 and look up the area in the inside part of the Z-table and then look up the corresponding z-score outside.

In our case, if we divide the confidence level by 2 then it will be 0.4750. So if you look it up in the below table and add the row value and column value for “z” corresponding to 0.4750 then it comes to 1.96

So, our critical value on the right tail of the curve will be 1.96 and on the left tail or low end of the curve which is below the average will be -1.96

Rejection Region

The reason why these critical values are so important is because it separates the area in red from the middle of the curve. The area in red is called the rejection region.

The reason it is called a rejection region is, in the next step we will perform a test which will give a z-value for our sample. If that sample z-value falls in any of the rejection regions(areas in red), that means we can reject our Null Hypothesis. Now, let’s move on to step-4.

4. Find the test statistic

This means that we are going to find the z-value for the sample.

z-value = (x-bar — μ) ÷ ( σ÷ √[n])

where x-bar = average of the sample = 105

μ = average of the population = 100

σ = standard deviation of sample = 15

n = sample size = 75

z-value = (105–100)÷(15÷√7.5) = 2.89

This value 2.89 is called the test statistic.

This takes us to our last step.

5. Draw a conclusion

So, if you look at the curve, the value of 2.89 will definitely lie on the red area towards the right of the curve because the critical value of 1.96 is less than 2.89. As the value lies in the rejection region, we could reject the Null hypothesis.

Conclusion:

Reject H0: which is μ = 100

Accept H1: which is μ ≠ 100

So, as per the problem statement, there is enough evidence to suggest that the average height has changed because we are able to accept the alternate hypothesis which says that the average height is not equal to 100.

--

--