The world’s leading publication for data science, AI, and ML professionals.

ANOVA and Kruskal-Wallis Tests, Explained

Learn what I think are the two tests that most essential for researchers in the field of social sciences

Source: Pexels (Free for Use)
Source: Pexels (Free for Use)

Introduction

There are hundreds of statistical tests that help us test out hypotheses and validate assumptions. There is a even a book called 100 Statistical Tests which I recommend having as an encyclopedia for statistical tests. Among all these tests, I personally think the Analysis of Variance (Anova) test and the Kruskal-Wallis test are the two most important tests that researchers and PhD students in the field of Social Sciences must know and learn. Why do I think so?

It is because comparing numerical values across different subsets of populations or groups is one of the most frequently performed comparisons in social science research. Think about the most recent social science paper you read. Think about any analytical news article or editorial. What do they do? For instance, if we are interested in looking at whether there is statistically significant differences in income among people with different political views (e.g. supports the Democratic Party, supports the Republican Party, Neutral), we need to use the ANOVA test. If we are interested in how children from various household income groups differ in educational attainment levels measured in some numerical format, we use the ANOVA test. As these examples illustrate, ANOVA and the non-parametric version of the ANOVA test, the Kruskal-Wallis test, are what scholars use and consider using all the time in their research.

In this article, I introduce to you what ANOVA test is, what it does, what types exist, its assumptions that need to be met and some examples by code. Let us dive into it!

What is an ANOVA test?

ANOVA test is an extended version of a t-test which aims to compare means of two groups and see the difference is statistically significant. T-test, however, cannot be performed for three or more groups and this is when the ANOVA test comes into play. Some might say we can carry out a separate t-test for each pair of groups but there is a downside tot this approach. Conducting numerous t-tests may increase the likelihood of false positives.

ANOVA’s null and alternative hypotheses are the following:

H0(Null): the means of the samples are equal.

H1(Alternative): one or more of the means of the samples are unequal

There are multiple types of ANOVA test but I will first introduce the One-Way ANOVA test which is the most frequently used kind of ANOVA test. The other types will be featured in a separate section towards the end of the post.

What are some assumptions that need to be met?

ANOVA test, unfortunately, cannot just be run in any situation. It is a parametric test, meaning it is governed by a set of parameters and assumptions. Let us look at some assumptions that need to be met.

  • The dependent variable needs to be continuous. This one is pretty straightforward. Recall that ANOVA test is an extension of the t-test which compares means, inherently numerical values, across different groups.
  • Variances need to be equal across groups. We call this the "Homogeneity of Variances" assumption. Mathematically, it can be denoted as the following: σ₁² = σ₂² = σ₃² = … = σ𝒸².
  • Samples are randomly selected from populations and assigned randomly to each group, allowing each instance sampled to be independent from one another.
  • Residuals approximately follow a normal distribution.

Methods on how to validate these assumptions will be explained within the code examples.

The ANOVA test: Code

Let us create a dummy dataset for the purpose of illustrating this example. We create three variables which will each correspond to a different group. Each variable will contain random integers who range is specified by each np.random.randint( ) function.

Source: From the Author
Source: From the Author

The first few row of the dataset looks like the following:

Source: From the Author
Source: From the Author

We also create a melt version of this data because various kinds of visualizations and statistical tests require different formats.

# Pandas melt function, as the name suggests, melts the specified columns into two variables where one stores the unique categories of those column labels and the other stores the values that were in each of the column associated with that label
df_melt = pd.melt(df, value_vars=['A', 'B', 'C'])
df_melt.rename(columns={'variable':'group'}, inplace=True)
df_melt.head()
Source: From the Author
Source: From the Author

We do some simple exploration of the data through some visualizations.

import matplotlib.pyplot as plt
import seaborn as sns
# seaborn's boxenplot is similar to box pot but differs in that it displays more granular quantiles than a box plot.
ax = sns.boxenplot(x='group', y='value', data=df_melt, color='#99c2a2')
plt.show()
Source: From the Author
Source: From the Author

We also look at whether the distribution of each group is normally distributed via QQ plots.

import numpy as np 
import scipy.stats as stats
import matplotlib.pyplot as plt
for g in df_melt['group'].unique():
    stats.probplot(df_melt[df_melt['group'] == g]['value'], dist="norm", plot=plt)
    plt.title("Probability Plot - " +  g)
    plt.show()
Source: From the Author
Source: From the Author
Source: From the Author
Source: From the Author

The three QQ plots that correspond do not seem to quite follow a normal distribution. Do not be confused though! We are not validating the normality assumption for ANOVA test here. That assumption is about the "residuals", not the distribution of the data itself. Here, we are just looking at various aspects of the data in each group for exploratory purposes and understanding them better.

There are three four different ways to perform an ANOVA test using Python.

Method 1: scipy.stats

Source: From the Author
Source: From the Author

Method 2: statsmodel

In this method, you need to use the Ordinary Least Squares model and its syntax resembles that of R for those who are more familiar with R than Python.

Method 3: pingouin

There is a package called pingouin that contains various mathematical operations and statistical tests. It has a very specific version of one-way ANOVA called the "Welch’s ANOVA". How is this different from the classic ANOVA test? Unlike the classic ANOVA test, Welch’s ANOVA can be used even if homogeneity of variances assumption is not satisfied. In this case, Welch’s ANOVA will often have a lower type I error rate than that of a classic ANOVA. If all assumptions for the classic ANOVA are met, it will be a safer bet to just use classic ANOVA than Welch’s ANOVA. Take a look at this article for further information about Welch’s ANOVA test.

Source: From the Author
Source: From the Author

Method 4: bioinfokit

Similar to pingouin, the bioinfokit package has an "analys" class that has various statistical test functionalities.

Source: From the Author
Source: From the Author

Regardless of which method you choose, the F-statistic and p-value obtained will be the same. In this example, the p value from the ANOVA analysis is statistically significant (p < 0.05), and therefore, we can conclude that there are significant differences in values among groups.

Checking Assumptions

Can we just conclude like above and move on? No! We need to make sure that the ANOVA test we ran was conducted under the correct assumptions.

Normality of Residuals

If you used method 4 (bioinfokit package) above to run the ANOVA test, you can simply grab the residuals from the ANOVA test and plot both the QQ plot and histogram.

## QQ-plot of residuals
import statsmodels.api as sm
import matplotlib.pyplot as plt
sm.qqplot(res.anova_std_residuals, line='45')
plt.xlabel("Quantiles")
plt.ylabel("Std Residuals")
plt.show()
## Histogram of residuals
plt.hist(res.anova_model_out.resid, bins='auto', histtype='bar', ec='k') 
plt.xlabel("Residuals")
plt.ylabel('Frequency')
plt.show()

If the visualizations above are not clear enough to give us a sense of whether the normality assumption is met or not, we can proceed with using a statistical test for checking normality, the Shapiro Wilk test.

Source: From the Author
Source: From the Author

The variable "model" in the Shapiro Wilk test above is the OLS model from method 2 that uses the statsmodel package to run the ANOVA test. The p-value is smaller than 0.05 and so we reject the null hypothesis and the normality assumption does not hold.

Homogeneity of Variances

Bartlett’s test is one test that allows us to check this homogeneity of variances assumption. It is part of the Scipy package’s stats class.

Source: From the Author
Source: From the Author

P-value is about 0.82 which is greater than 0.05 and so we fail to reject the null hypothesis. We assume that the different groups have equal variances.

There is another test called the Levene’s test that allows us to check the homogeneity of variances when the data did not pass the normality test in the previous part. This test is offered by the bioinfokit package and you can find more information from here.

Kruskal-Wallis Test

Since not all the assumptions required for the ANOVA test were satisfied, we turn out attention to the Kruskal-Wallis test which is a non-parametric version of the one-way ANOVA test. The term non-parametric means the methodology is free from the constraints of the parameters and underlying assumptions.

The null and alternative hypotheses of this test are:

The null hypothesis (H0): The median is equal across all groups.

The alternative hypothesis (Ha): The median is not equal across all groups.

Source: From the Author
Source: From the Author

The p-value is smaller than 0.05 and so we reject the null hypothesis. We say that there exists some statistically significant differences in median between some pairs. Just like the ANOVA test, it does not tell us which pairs have these statistically significant differences in median. Therefore, we need to perform a post-hoc test where we do pairwise comparisons to identify which pairs caused this result in our ANOVA analysis. Tukey’s test is a statistical test that we use to make these pairwise comparisons. Read more about this test in this article.

Conclusion

In this article, I introduced to you the ANOVA test and the Kruskal-Wallis test. They are two useful statistical tests that allow us to compare means or medians across different groups and see if the differences are statistically significant. The next time you read an academic paper, you will not be intimidated with the methodology section which will often include these statistical tests for the authors’ analyses.

If you found this post helpful, consider supporting me by signing up on medium via the following link : )

joshnjuny.medium.com

You will have access to so many useful and interesting articles and posts from not only me but also other authors!

About the Author

Data Scientist. 1st Year PhD student in Informatics at UC Irvine.

Former research area specialist at the Criminal Justice Administrative Records System (CJARS) economics lab at the University of Michigan, working on statistical report generation, automated data quality review, building data pipelines and data standardization & harmonization. Former Data Science Intern at Spotify. Inc. (NYC).

He loves sports, working-out, cooking good Asian food, watching kdramas and making / performing music and most importantly worshiping Jesus Christ, our Lord. Checkout his website!


Related Articles