The world’s leading publication for data science, AI, and ML professionals.

Two-Way ANOVA Test, with Python

The Complete Beginner's Guide to perform Two-Way ANOVA Test (with code!)

Photo by Sergey Pesterev on Unsplash
Photo by Sergey Pesterev on Unsplash

Anova tests are designed to test for any statistically significant differences between means of three or more groups. There are two types of ANOVA (analysis of variance) that are commonly used, one-way ANOVA test and two-way ANOVA test. The only difference is the number of independent variables that affect the dependent variable.

Two-Way ANOVA

The two-way ANOVA is an extension of the one-way ANOVA that examines the effect of two different categorical independent variables or two independent factors on one continuous depedent variable.

The two-way ANOVA not only aims to test the main effect of each independent factor but also test if the two factors affect each other to influence the dependent variable, i.e., if there is any interaction between two independent factors. [2]

ANOVA uses the F test, a groupwise comparison test, for statistical significance. It compares the variance in each group’s mean under different factors (factor A, factor B, interaction between factor A & factor B) to the overall variance in the dependent variable. Finally, based on the F-test statistic, a conclusion is made.

Sum of Squares (SS)

Inside the Two-Way ANOVA Table: The total amount of variability comes from four possible sources, namely:

  1. Variation among the groups under factor A, called treatment (A)
  2. Variation among the groups under factor B, called treatment (B)
  3. Sum of Squares due to interaction between factor A and factor B, called interaction (AB)
  4. Variation within the groups, called error (E)
Image 1. Illustration of SS and d.f. Image by Author
Image 1. Illustration of SS and d.f. Image by Author

Similar to Sum of Squares (SS), d.f. (SSTO) = d.f. (SSA) + d.f. (SSB) + d.f. (SSAB) + d.f. (SSE)

SS divided by its d.f. will result in a mean square (MS).

Assumptions for the two-way ANOVA test are the same as those of the one-way ANOVA test, which makes all of the normal assumptions of a parametric test i.e. sample data’s randomness and independence, normality, and homogeneity of variance. If you want to read more details, can refer back to the previous article. [3]

The simple outline of the two-way ANOVA test

A two-way ANOVA has three sets of hypotheses:

Set 1: H₀: μₐ₁= μₐ₂ = μₐ₃ = … = μₐ𝒸 H₁: Not all μₐᵢ’s are equal under factor A, where i = 1, 2, 3, …, c. Level of significance = α

Image 2. F-test statistic to test the main effect of factor A. Image by Author.
Image 2. F-test statistic to test the main effect of factor A. Image by Author.

Set 2: H₀: μᵦ₁= μᵦ₂ = μᵦ₃ = … = μᵦᵣ H₁: Not all μᵦᵢ’s are equal under factor B, where i = 1, 2, 3, …, r. Level of significance = α

Image 3. F-test statistic to test the main effect of factor B. Image by Author.
Image 3. F-test statistic to test the main effect of factor B. Image by Author.

Set 3: H₀: The effect of one independent variable does not depend on the effect of the other independent variable, i.e., there is no interaction between factor A and factor B H₁: There is an interaction between factor A and factor B Level of significance = α

Image 4. F-test statistic to test if there is an interaction between two independent factors. Image by Author.
Image 4. F-test statistic to test if there is an interaction between two independent factors. Image by Author.

If you perform a two-way ANOVA test with interaction, you need to test all 3 sets of hypotheses mentioned above. ** But if you perform the test without interaction, you only need to test the Set 1 and Set 2 hypothese**s.

Finally, the two-way ANOVA table with interaction is shown below:

Table 1. Sample two-way ANOVA table with interaction. Image by Author.
Table 1. Sample two-way ANOVA table with interaction. Image by Author.

and two-way ANOVA table without interaction is as shown below:

Table 2. Sample two-way ANOVA table without interaction. Image by Author.
Table 2. Sample two-way ANOVA table without interaction. Image by Author.

Balanced Design vs Unbalanced Design

A balanced design is a situation where all sample sizes for all combinations of groups are equal. In an unbalanced design, the sample sizes for various groups are unequal. In two-way ANOVA, if the sample sizes of groups are too different, the normal approach of variance analysis is not adequate. For an unbalanced design, the regression approach is needed to be used instead. Another way is to make extensive efforts to ensure a balanced design.


A dataset, students.csv, contains 8239 rows of student particular data. Each row represents a unique student. It consists of 16 features related to the student and we will only focus on 3 features major, gender and salary [1].

Based on the two factor, major and gender, is there significant difference in average annual salary for graduates of different gender and major and also if there is any interaction between gender and major at 5% significance level?

Data Processing

From the dataset given, we need to filter out the students who graduated and perform a random sampling. In this case, it randomly sampled 40 students in each group i.e. different combinations of (major, and gender) to make it a balanced design. After that, select the dataset for the three variables of interest, the categorical variable major, gender and the numeric variable salary.

Image 5. Data processing to make a balanced design. Image by Author.
Image 5. Data processing to make a balanced design. Image by Author.

Hypothesis Testing

According to five steps process of hypothesis testing:

Set 1: H₀: μₐ₁= μₐ₂ = μₐ₃ = … = μₐ₆ H₁: Not all salary means are equal under different major

Set 2: H₀: μᵦ₁= μᵦ₂ H₁: Not all salary means are equal under different gender

Set 3: H₀: There is no interaction between major and gender H₁: There is an interaction between major and gender

α = 0.05 According to F test statistics:

Image 6. ANOVA table with interaction: normal approach of variance analysis. Image by Author.
Image 6. ANOVA table with interaction: normal approach of variance analysis. Image by Author.

We could also get the same result using statsmodels package which uses the regression approach. Since the statsmodels use regression approach, it is also suitable for unbalanced design i.e. you won’t need to make extensive efforts to ensure a balanced design.

Image 7. ANOVA table with interaction: regression approach. Image by Author.
Image 7. ANOVA table with interaction: regression approach. Image by Author.

Below shows the interaction plot of major and gender on salary:

Image 8. Interaction plot of major and gender on salary. Image by Author
Image 8. Interaction plot of major and gender on salary. Image by Author

Conclusion

For Set 1 & Set 2: Null hypothesis is rejected since F score > F critical or p-value is < 0.05. ∴We have enough evidence that not all average salaries are the same for graduates of different study subjects or gender, at 5% significance level.

For Set 3: Failed to reject the null hypothesis. ∴We do not have enough evidence that study subjects and gender has interaction, at 5% significance level. Moreover from interaction plot [4], it shows that there is no interaction, and both main effects, major and gender effects, are significant. For example, the average salaries of graduates will be significantly higher for males who graduated in Biology.


Recommended Reading

ANOVA Test, with Python

Chi-Square Test, with Python

McNemar’s Test, with Python

One-Sample Hypothesis Tests, with Python

Two-Sample Hypothesis Tests, with Python


References

[1] "One-way ANOVA Hypothesis Test • SOGA • Department of Earth Sciences." [Online]. Available: https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/ANOVA/One-way-ANOVA-Hypothesis-Test/index.html

[2] Two-way analysis of variance – Wikipedia

[3] Kiernan, D. (2014). Chapter 6: Two-way Analysis of Variance. Open SUNY Textbooks.

[4] Chapter 7 ANOVA with Interaction | STA 265 Notes (Methods of Statistics and Data Science). (n.d.). Retrieved January 2, 2023, from http://campus.murraystate.edu/academic/faculty/cmecklin/STA265/_book/anova-with-interaction.html#the-interactive-two-way-anova-model


Related Articles