Getting Started
Statistical inference is the process of making reasonable guesses about the population’s distribution and parameters given the observed data. Conducting Hypothesis Testing and constructing confidence interval are two examples of statistical inference. Hypothesis testing is the process of calculating the probability of observing sample statistics given the null hypothesis is true. By comparing the probability (P-value) with the significance level (1-ɑ), we make reasonable guesses about the population parameters from which the sample is taken. With a similar process, we can calculate the confidence interval with a certain confidence level. A confidence interval is an interval estimation for a population parameter, which is point estimation plus and minus the critical value times sample standard error. This article will discuss the standard procedure of conducting hypothesis testing and estimating confidence intervals in the following different scenarios:

This article is both served as a tutorial for statistical inference, as well as a cheat-sheet for your reference. The sections below will discuss the procedures in detail, and at the end of the article, I will summarize discussions in two tables for convenience.
1, Statistical Inference for Mean
1.1 Distribution Assumptions
We need to have assumptions about the underlining distributions when using statistical inference techniques. According to the central limit theory, the distribution of sample means approaches to normal distributions as sample size increases, no matter what the population distribution is. The samples’ means thus follow the normal distribution if the sample sizes are large enough.
The test we usually use here is either the student t-test or the Z test. Z test is based on the normal distribution while student t-test is based on a distribution similar to a normal distribution, but with fatter tails. When the sample size is lower than 30 (the standard cut-off) or the population standard deviation is unknown, we use the student t-test. Otherwise, we use the Z test.
1.2 One-Sample Mean
For a sample with n observations:
We observe Ᾱ as the mean of the sample. We can test whether this sample is drawn from a population with mean equals to μ by checking whether Ᾱ differs significantly from μ. We can also estimation a 95% confidence interval for the population mean where this sample is drawn from.
Hypothesis Testing
Here are the steps for conducting hypothesis testing:
- Step 1: Set up the null hypothesis:
Two tails:
H0: Ᾱ = μ
H1: Ᾱ != μ
One tail:
H0: Ᾱ ≥ μ
H1: Ᾱ < μ
or:
H0: Ᾱ ≤ μ
H1: Ᾱ > μ
The alternative hypothesis H1 is the hypothesis we want to test. For example, if we want to test whether Ᾱ is larger than μ, we set H1 as Ᾱ > μ.
- Step 2: Calculate the test statistic:
For the student t-test, we need to use the sample standard deviation s to estimate population standard deviation σ:
and the t statistic is:
Keep in mind that for the student t-test, since the observation is relatively smaller in the sample, we need to specify the degree of freedom to find the right value. The degree of freedom is defined as n-1, where n is the sample size.
If we know the population standard deviation σ, and the sample size n is greater than 30, we can use the z test and calculate the z statistics:
- Step 3: Compare the critical value to test statistic
To get the critical value, we need to specify the significance level 1-ɑ and refer to the t or z tables. For example, for a two-tailed t-test with a sample size equals to 10 and a significance level of 95%, the critical value is 2.262,, as highlighted below:

For a two-tailed z-test with a significance level of 95%, the critical value is 1.96, as highlighted below:

The graph below shows the meaning of critical value. Z test is based on a standard normal distribution (N(0,1)). The nature of the distribution indicates that for a random variable x that follows N(0,1), there are only 5% of the chance for |x| ≥ 1.96. Critical value at 1.96 is associated with a 95% (1–5%) significant level. If the z test statistic calculated above is larger than 95%, it means the probability of observing this sample statistics (p-value) is less than 5%. Thus we can reject the null hypothesis at the 95% significance level.

Note that at the same significance level, 95%, the critical value for the t-test is larger than the value for the z-test, which is corresponded with the fact that the t distribution has fatter tails.
Confidence Interval
The confidence interval is an interval estimate with a certain confidence level for a parameter. It is calculated by the point estimation plus or minus the margin of errors (ME):
The point estimate is just the mean of the sample, and ME is calculated by:
The distribution and the confidence level define the critical values, and the standard error (SE) is calculated through the sample or population standard deviation. For one sample mean’s confidence interval, if we do not know the population variance, or when the sample size is too small, we can calculate it by:
where Ᾱ is the sample’s mean, and t can be found in the t table above based on the confidence level and degree of freedom. For example, for a sample with 10 observations, the t value for the 95% confidence interval is 2.262.
Otherwise, we need to use the z table to calculate the confidence interval:
The z value can be found in the z table. The z value for the 95% confidence interval is 1.96.
1.3 Two Samples Mean: Independent Samples
When we observe two samples, we may wonder whether the means from the two samples differ significantly from each other. If we have reasons to believe that the two samples are uncorrelated with each other, we can test it either use hypothesis testing with the null hypothesis states the means equal to each other, or conduct a confidence interval for the difference of the means and check whether zero is inside the interval. The procedures are quite similar to the one-sample case, with a bit of difference in calculating the test statistics and standard error.
Hypothesis Testing
For two samples with mean Ᾱ1 and Ᾱ2, we can set up the null hypothesis and alternative hypothesis for a two-tailed test like this:
H0: Ᾱ1 = Ᾱ2
H1: Ᾱ1 != Ᾱ2
We can also set up the one-tailed test(H0: Ᾱ1 > Ᾱ2) if we want to check either one of the means is significantly larger than the other. If both samples are not large enough, we can use the t table assuming a t distribution, and calculate the t statistics as follows:
where S1² and S2² are the variances from the two samples that calculate by:
Depending on the practical situation, we can also set the null hypothesis to check whether the difference between the two means are greater than a certain number that is larger than 0, which sometimes is referred as the effect size. A larger effect size makes it easier to reject the null hypothesis since the difference is bigger, thus increases the statistical power. For more details, you can check out my article here:
How is Sample Size Related to Standard Error, Power, Confidence Level, and Effect Size?
Note that when we calculate the collective standard error as above, we have the assumption that the two samples come from the population that have different variances (σ1² != σ2²) . When we believe σ1² = σ2², we can calculate the pooled standard error:
and calculate the standard error for the test statistics as follows:
The test statistics for the null hypothesis becomes:
Confidence Interval
The confidence interval for two sample means are used to describe the difference of the two mean. Using the t critical value, we can calculate the confidence interval as follows:
Note that similar to the discussion above, with different assumptions about the population variance, we can calculate the standard error in the margin of error term differently.
1.4 Two Samples Mean: Paired Samples
In the previous section, we have discussed the situation when the two samples are independent from each other. What about the situation when two samples are correlated with each other in some way? For example, the two samples come from the same subjects before and after the treatment, or the samples were taken from different people in the same household, etc. We usually have n1 = n2 in these cases. For example, if we want to test whether there are treatment effect in the treat group, we can collect samples before and after the treatment:
We need to calculate the difference before and after treatment for each individual, and get the sample of observed difference:
where
In such a way, we have transformed a two-samples case into a one-sample case. Following the procedure discussed above, first we need to calculate the mean and standard deviation of the sample of difference:
Hypothesis Testing
We can set up the null hypothesis based on the practical situation. Typical null hypothesis and alternative hypothesis for a two-tailed test are:
H0: Ᾱ^d = μ
H1: Ᾱ^d != μ
μ can be any number. The test statistics is calculated as follows:
Depending on the samples, we can choose to conduct t test or z test.
Confidence Interval
We can also construct a confidence interval for the samples differences. We only need the difference mean, and difference standard deviation to construct the interval. A confidence interval based on the student t distribution is:
2, Statistical Inference for Proportion
2.1 Distribution Assumptions
Mean measures the central tendency of continuous variables, but it cannot be used in categorical variables. For categorical variables, we can use the proportion of each category’s count in Statistical Analysis. The proportion of category i in a sample with n categories is calculated by:
C_i is the number of observations in category i, and N is the sample size (total observations of all n categories).
Here I will use a simple example to illustrate the process. When tossing a coin, we can either get "Head" or "Tail". Rather than following the normal distribution for mean statistical inference, we use the binomial distribution for binary classification proportions. According to the binomial distribution properties, as the sample size gets larger, the binomial distribution approaches a normal distribution. The standard definition of "a large sample" in statistical inference is when np and n(1-p) both larger than 10. If not, we will use the student t distribution for the inference.
2.1 One sample proportion
One sample proportion calculates the proportion of a category in a sample. As discussed above, a use case of one-sample proportion is to test whether a coin is unbiased. With enough number of tosses, the proportion of "Head" should equal the proportion of "Tail" at 0.5 if the coin is unbiased (Law of Large Numbers).
Hypothesis Testing
Hypothesis testing for one-sample proportion follows similar setting up procedures. Using the coin-tossing example above, if we want to test whether a coin is unbiased, giving a sample of coin tossing results:
It is the same as testing:
H0: P_H = P_0
H1: P_H != P_0
To test whether the coin is unbiased, we can set P_0=0.5. Note this is two-tail testing. We can rewrite the null hypothesis to be P_H > 0.5 to see whether this coin is biased towards the Head.
We need to first count how many "Head" are in the sample to calculate P_H. After that, suppose the sample size is large enough, we can then calculate the z Statistics:
P_H is calculated from the sample. P_0 is set at 0.5 in this example. The denominator is used to calculate the standard error for this sample (deriving from binomial distribution). Following the same procedure described above, we can use the z table or the t table to find the critical values. By comparing the statistics calculated here, we can decide whether to reject the null hypothesis or not.
Confidence Interval
The confidence interval for proportion follows the same pattern as the statistical inference for mean, which is using the point estimate and margin of errors, except the standard errors are calculated differently here :
Note that the standard error for hypothesis testing is different from the confidence interval. The former uses P_0 while the latter uses P_H.
2.2 Two Samples Proportion
Two-Samples proportion test compares the proportions in two samples, which is widely used in AB testing. For example, when we compare the conversion rate between the treatment and control group to see whether there exists a significant treatment effect, we need to test whether the difference in the conversion rate is significantly enough. We can use hypothesis testing to test whether the two proportions difference, or construct a confidence interval for the difference.
Hypothesis Testing
Based on the two samples we have, we can calculate the two proportions P1 and P2. To test whether the two proportions are not significantly different from each other, meaning that the two samples could be drawn from the same population, the null hypothesis and alternative hypothesis are:
H0: P1 = P2
H0: P1 != P2
Note that this is a two-tailed test. For one-tailed test, we can check whether P1 is greater than or less than P2 in the null hypothesis.
An important variable we need to calculate for two sample proportion is called P_pool:
You can understand it as we are pooling the two samples together, what is the proportion of category i in the pooled sample.
Similarly, if n1P1 and n2P2 are both greater than 10, we can use the z statistics as the distribution follows a normal distribution. If not, we need to calculate the t statistics. The statistics is calculated by:
Note that we are using P_pool to calculate the standard error.
Confidence Interval
Like two sample mean’s confidence interval, two sample proportion’s confidence interval is also used to inference the difference between the two proportions. If both sample sizes are large enough, we can use the critical value z from the z table and calculate the confidence interval as:
The only difference between the confidence interval and hypothesis testing is the calculation of standard error. Instead of using the pooled proportion, confidence interval uses the standard errors for each sample individually.
That’s a lot of information to digest. Here I use two tables to summarize the main takeaways of this article:
- For one sample mean and proportion:

- For two samples mean and proportion:

Thank you for reading. Here is the list of all my blog posts. Check them out if you are interested!
Read every story from Zijing Zhu (and thousands of other writers on Medium)