
Youtube link for this blog: T-test implementation in python
Statistics play a vital role not only in the data analysis but also in decision making. There are a bunch of tests that statisticians use to get the most accurate information from the data. Here, I will discuss two basic statistical tests: t-test and Analysis of variance.
T-test:
T-test compares the mean between two groups of interest. If we ask an ordinary person to compare the means between the groups, he/she will simply calculate the mean of the first group and then the mean of the second group and finally compare but that is not what a statistician will do. When the difference between the means is calculated, the simple question arises. Is this difference between means statistically significant to conclude that the groups are different? In other words, there is no reference to compare the difference between the means if we proceed that way. T-test is the solution to that problem.
Let’s say, we have two sections of students: section A and section B. The mean scores in mathematics of sections are 95 and 90 respectively. Therefore, the difference is 5. The question is: does this difference of 5 provide enough evidence that the mean score between the two sections are different? Let’s dive into the T-test.
T-test was named like that because it was first introduced by a statistician named William Sealy Gosset whose pen name was Student and he first introduced the concept of t-statistic. When he ran a hypothesis test, he generated a test statistic called t-statistic which was basically used as a cut-off point for comparison. T-statistic is calculated as below:

where, t=t-statistic, x-bar=sample mean, µ= hypothesized mean, s= standard deviation of the sample and n= total number of samples. This value is compared against the t-value for the rejection region. There are many libraries in python that can perform this test and provide the t-statistic as well as p-value. P-values tells us the probability of the sample data happened by chance. Therefore, if p-value is small (<5%), there is little evidence that the sample data occurred by chance and need more investigation. If p-value is high (>5%), it can be concluded that there is enough evidence that the collected data are random data. The null hypothesis is stated to take the usual occurrence as random and taking no difference between the means. The alternative hypothesis is the opposite, stating that there is difference between the means of the groups. Every t-value comes with a p-value associated with it.
Let’s go back to the example of two different sections to compare the means of math score. The null hypothesis, in this case, will state that there is no difference between the sections’ score. The alternative will state that there is difference between the sections. This is considered as a two-sample t-test. The t-statistic for two samples is


One fundamental assumption to perform t-test is that both samples are normally distributed. T-test can also be performed before and after any specific treatment is applied which has a name called ‘paired t-test’ to compare the means of the same group including the effect of the treatment. For example, this can be utilized to quantify the effect of a math crash course on the students of only section A. The sample data should be collected from section A before and after the crash course and then perform the test.
ANOVA:
Anova stands for analysis of variance. Although it is named like that, the purpose of ANOVA is to compare the means among two or more samples. When we dive into the mechanism, we will understand that the inference about the means comes from the variance and that’s why it is called analysis of variance, not analysis of means. It can be considered as an extension to t-test.
The essence of ANOVA lies in the comparison of the variances among the groups and within each of the groups. These two terms are dubbed as "Sum of squares for Treatments" or SST and "Sum of squares for Error" or SSE respectively.

where, k=number of groups, n=number of datapoints in the samples and x-double bar= grand mean.

This expression can further be simplified to incorporate variances [1].

The mean square of treatment,

and mean square of error,

The test statistic,

If MST is high, the F-value will be high also and consequently will provide the evidence to reject the null hypothesis which states that the means are equal among the groups. The mathematics show that if the variance between the groups is higher than the variance within the group, the SST, MST and finally the F-value will be high.
Let’s assume that our calculated test statistic is Fc. The p-value will tell us what is the probability that the test statistic will be greater than Fc. In other words, the p-value shows the probability that the sample data have occurred by chance. Therefore, if p-value is small (<5%), the possibility that the data is purely coming from chance is small and need more attention. If p-value is high(>5%), the probability is high that the data is randomly sampled and we conclude that there is not enough evidence to reject the null hypothesis.
Code for t-test:
I took a simple dataset that provides the revenue of different products among different employees and supervisors. Let’s import the required libraries first
There data are placed in different data frames from the worksheets and then revenue for individual product is calculated.
The final data frame will look like this:

The sum and mean of revenue from Supervisor-1 and Supervisor-2 are calculated below.
The means are 15137.47 and 14711.63 respectively and we would like to perform t-test on this revenue data [2].
Once run, the following table is generated showing several statistics including t-statistic, degree of freedom and p-value.

The t-statistic is 1.283 with p-value of 0.199. The high p-value(>5%) tells us that there is not enough evidence to reject the null hypothesis. Alternatively, we can say that the evidence is not enough to prove that the group means are different. The 95% confidence interval is also shown. Cohen-d is the standardized difference between the two means i.e. in this case the two means are differing by 0.045 standard deviation. BF10 represents the Bayes factor which is the ratio of the likelihood of H1 (alternative hypothesis) and H0(null hypothesis). In this case, the likelihood of the alternative hypothesis is 9% only.
Code for ANOVA:
I have used two different libraries to perform ANOVA. First, I used pingouin library [2] using the following code which includes a time tracker to check the duration.

Similar analysis can also be performed using statmodels which uses anova_lm method. We get the same information from these two libraries but the duration is almost 4 times than that from the pingouin library. If the data size is big and if we are interested in similar output, pingouin will be a better choice.

Let’s interpret the results from pingouin library.

The SS represents the sum of squares and DF is degree of freedom. Since there are four supervisors, the degree of freedom is 3. MS is mean square which is SS divided by DF. The calculated F-statistic is 0.867 with p-value of 0.456 which states that there is not enough evidence to reject the null hypothesis. So we cannot say that the revenues earned by different supervisors are different.

Statmodels also generates the same p-value for this four groups.
If one is interested to include another factor with Supervisor, the analysis will become two-way and will be called two-way ANOVA. Let’s include ‘Product ID’ as another determining factor for revenue earning. I needed to rename the ‘Product ID’ column since there is a space in between.

The p-value for Product ID is 0 which means that the results are significant for that factor. When the two factors intersect, the p-value becomes 0.762 which still is not significant. Therefore, the revenue means are not statistically significant in this two-way ANOVA case.
Conclusion:
This article demonstrates the implementation of t-test and ANOVA in python environment. Statistical tests like ANOVA is a very crucial test when running a model for Machine Learning. For a good model, it is important to select the best features to train it. ANOVA helps to find out that. If we compare multiples groups and end up with a small p-value, we can conclude that there is significant variance between the groups and that feature must be selected for training the model.
Reference:
[1] Statistics for Management and Economics 10th Edition by Gerald Keller