Surely in some part of your training or even in your work, you have heard about hypothesis tests, but do you know what they are for or how they are implemented? If the answer is no, I invite you to stay because we will talk about the famous hypothesis tests in this blog.
Get comfortable, go get your favorite drink and enjoy.
Hypothesis Tests
Imagine that in an article we read the statement: "… all adults on average sleep 7 hours a day". How to validate that the previous assertion is valid? With hypothesis tests.
Hypothesis tests allow us to validate some property that is assumed from a population concerning the property extracted from a representative sample. From our previous example, "all adults" is our population, from which we would seek to validate the property that refers to "sleep 7 hours a day".
To validate the claim that all adults sleep 7 hours a day, we would need to collect a significant sample to be compared with the population.
Imagine that we ask 100 random adults how many hours they sleep a day, from which we obtain that on average they sleep 7.5 hours a day. So, our sample mean is 7.5 hours which is 0.5 higher than the population mean.
So given the sample mean, we may think: since the difference is only 0.5, do we assume that the claim is correct? or since it was not the same, do we say that the claim is incorrect? Well, to solve these questions we would need to use something calledz-scores
which we will see in detail later.
Everything is simple so far right? before continuing, let’s formalize a few things.
To perform a hypothesis test, we need to determine 2 hypotheses: the null hypothesis (or
H0
) and the alternative hypothesis (orH1
). The null hypothesis refers to the formalization of the assertion of a statistical property of the population to be verified. The alternative hypothesis is the antagonist of the null hypothesis, that is, it is the assertion that refutes the null hypothesis.
Now let’s go back to the previous problem, how to know if the population statement is correct when our sample mean turned out to be 0.5 higher? To address this question, we first need to know a few more things.
To check if the statement in the null hypothesis is correct, we need to define a significance level. Commonly, the significance level is 5% and is interpreted as follows: if the probability of our sample mean is less than or equal to 5%, then the null hypothesis is rejected, on the other hand, if the probability of our sample mean is greater than 5%, then the null hypothesis remains or fails to be rejected.
But why 5%? Let’s proceed to understand how this works.
When working with a normal distribution, for practicality, such distribution is usually transformed to a standard normal distribution, this process is called standardization. The standard normal distribution is a symmetric curve with a mean of 0 and a standard deviation of 1 whose area under the curve is 1 or 100% (as shown in figure 2). This standardization allows us to quantify the number of standard deviations at which a sample mean is found concerning the population mean. The number of standard deviations is determined by the z-scores. Boom!

So, since we work with a standard normal distribution, all the values of the sample mean concerning the population mean are normally distributed, therefore, at least 95% of all the sample means fall within 2 standard deviations of the population mean, that is, there is less than 5% probability of obtaining a sample mean beyond 2 standard deviations of the population mean.

Before continuing, let’s formalize a few things:
The null hypothesis is rejected when the sample mean is associated with a low probability of occurrence. The null hypothesis is retained when the sample mean is associated with a high probability of occurrence.
Such probability of occurrence is better known as p-value.
Then, if the probability of occurrence (or p-value) is less than or equal to 5%, the null hypothesis is rejected, on the other hand, if the probability of occurrence (or p-value) is greater than 5%, the null hypothesis is retained.
In summary, the Hypothesis Testing methodology is described as the sequence of the following steps:
- State the hypothesis: In this phase, the statement to be tested is declared, that is, the null hypothesis. Consequently, the alternative hypothesis is generated. The alternative hypothesis can determine 3 types of cases: that the population mean is greater than (>), less than (<), or not equal (≠) to the value defined in the null hypothesis.
- Set the criteria for a decision: The criterion for the decision is defined through the level of significance, usually defined at 5%, however, 10% and 1% are also usually used.
- Compute de test statistic: In this phase, the statistical test is applied to determine how close or far our sample mean is concerning the population mean. There are several statistical tests for different types of distributions, usually, for a Normal Distribution, the z-test or test based on z-scores is used.
- Make a decision: Given the result obtained by the statistical test and the criteria for the decision defined in step 2, whether the null hypothesis is rejected or retained is determined.
Awesome, now that we understand what a hypothesis test is and the intuition about the implementation with some formalities, now let’s see how to apply the z-test for each of the variants that the alternative hypothesis could take, let’s go to the next section!
Directional & Non-directional Hypothesis Tests
Depending on what we know about the population, it is the test that we can apply. If we know the mean and the variance, the z-test would be the most appropriate option.
As we saw in the previous section, the alternative hypothesis is the one that refutes or contradicts the null hypothesis. This contradiction refers to the fact that the sample mean is greater than (>), less than (<), or different (≠) from that established in the null hypothesis, that is, the alternative hypothesis can be directional or non-directional.

The hypothesis is directional when the alternative hypothesis defines an orientation, either greater than (>) or less than (<) that established in the null hypothesis. For instance:
- Null hypothesis: All adults sleep 7 hours a day
- Alternative hypothesis: All adults sleep more than 7 hours a day
In the same way, the hypothesis can be oriented as follows:
- Null hypothesis: All adults sleep 7 hours a day
- Alternative hypothesis: All adults sleep less than 7 hours a day
In both cases, the hypothesis is directional.
On the other hand, the hypothesis is non-directional when the alternative hypothesis does not define an orientation explicitly, that is, it only determines that the hypothesis is different from the null hypothesis. For instance:
- Null hypothesis: All adults sleep 7 hours a day
- Alternative hypothesis: All adults do not sleep 7 hours a day
Great, now that we know what hypothesis testing is when to apply the z-test, and the orientations of the hypotheses according to the alternative hypothesis, it’s time to see a couple of examples. Let’s go for it!
Example
Let’s take the following statement: "In a study, it was found that all adults sleep 7 hours a day with a standard deviation of 1 hour. Suppose we take a sample from which we obtain a sample mean of 8 hours. Perform a hypothesis test to verify the population mean"
To perform the hypothesis test, we follow the following 4 steps:
Step 1: State the hypothesis. Imagine that we want to address a non-directional hypothesis, our null hypothesis and alternative hypothesis would be as follows:
- Null hypothesis: Adults sleep 7 hours a day
- Alternative hypothesis: Adults do not sleep 7 hours a day
Step 2: Set the criteria for a decision.
The level of significance will be 0.5 or 5%, which consequently determines alpha = 0.5
. Since we are addressing a nondirectional two-tailed test, we divide the alpha value in half so that an equal proportion of the area is placed in the upper and lower tail, as shown in Figure 3. The calculation of alpha is shown in equation 1.

Since we have calculated the alpha value
for a two-tailed test, then we can determine the critical values, that is, those values that determine the rejection zone in the standard normal distribution.
To find the critical values, we look at z-table
the value of z
that approximates an area under the curve similar to 0.0250. In this case, the value is 1.96, that is, if the value of our statistical test is greater than 1.96 standard deviations or less than -1.96 standard deviations, we would be in the rejection zone.

Step 3: Compute the test statistic. Once we have defined the hypotheses as well as the level of significance, we proceed to calculate z statistic.
The z statistic gives us the number of standard deviations that a sample mean deviates from the population mean in a standard normal distribution. The z statistic is calculated by taking the sample mean minus the population mean (defined in the null hypothesis), divided by the standard deviation, as shown in equation 2.

Then, from the calculations, we obtain that z = 1
. Finally, we must make the decision, which we will do in the next step.
Step 4. Make a decision. To make a decision, we look at the critical value obtained in step 2 based on the level of significance. Since the value of the z statistic
(obtained in step 3) is less than the critical value, it is decided to keep the null hypothesis.

Finally, the probability of obtaining z = 1
is determined by p-value
. To find such value, we use the unit normal table. In this case, we are looking for the z-score
equal to 1, the value is 0.15866. Given that we address an alternative two-tailed hypothesis, we multiply the obtained value by 2, leaving p = (0.15866) * 2 = 0.31732.
In step 2 we determine the level of significance equal to 5%, that is, if the p-value
is less than 5%, the hypothesis is rejected, otherwise, if the p-value
is greater than 5%, the hypothesis null remains. In this case, p = 31.7%, being greater than 5%, therefore the null hypothesis is maintained.