
Anova tests are designed to test for any statistically significant differences between means of three or more groups. There are two types of ANOVA (analysis of variance) that are commonly used, one-way ANOVA test
and two-way ANOVA test
. The only difference is the number of independent variables that affect the dependent variable.
Two-Way ANOVA
The two-way ANOVA is an extension of the one-way ANOVA that examines the effect of two different categorical independent variables or two independent factors
on one continuous depedent variable.
The two-way ANOVA not only aims to test the main effect of each independent factor but also test if the two factors affect each other to influence the dependent variable, i.e., if there is any interaction between two independent factors. [2]
ANOVA uses the F test, a groupwise comparison test, for statistical significance. It compares the variance in each group’s mean under different factors (factor A, factor B, interaction between factor A & factor B) to the overall variance in the dependent variable. Finally, based on the F-test statistic, a conclusion is made.
Sum of Squares (SS)
Inside the Two-Way ANOVA Table: The total amount of variability comes from four possible sources, namely:
- Variation among the groups under factor A, called treatment (A)
- Variation among the groups under factor B, called treatment (B)
- Sum of Squares due to interaction between factor A and factor B, called interaction (AB)
- Variation within the groups, called error (E)

Similar to Sum of Squares (SS), d.f. (SSTO) = d.f. (SSA) + d.f. (SSB) + d.f. (SSAB) + d.f. (SSE)
SS divided by its d.f. will result in a mean square (MS).
Assumptions for the two-way ANOVA test are the same as those of the one-way ANOVA test, which makes all of the normal assumptions of a parametric test i.e. sample data’s randomness and independence, normality, and homogeneity of variance. If you want to read more details, can refer back to the previous article. [3]
The simple outline of the two-way ANOVA test
A two-way ANOVA has three sets of hypotheses:
Set 1: H₀: μₐ₁= μₐ₂ = μₐ₃ = … = μₐ𝒸 H₁: Not all μₐᵢ’s are equal under factor A, where i = 1, 2, 3, …, c. Level of significance = α

Set 2: H₀: μᵦ₁= μᵦ₂ = μᵦ₃ = … = μᵦᵣ H₁: Not all μᵦᵢ’s are equal under factor B, where i = 1, 2, 3, …, r. Level of significance = α

Set 3: H₀: The effect of one independent variable does not depend on the effect of the other independent variable, i.e., there is no interaction between factor A and factor B H₁: There is an interaction between factor A and factor B Level of significance = α

If you perform a two-way ANOVA test with interaction, you need to test all 3 sets of hypotheses mentioned above. ** But if you perform the test without interaction, you only need to test the Set 1 and Set 2 hypothese**s.
Finally, the two-way ANOVA table with interaction is shown below:

and two-way ANOVA table without interaction is as shown below:

Balanced Design vs Unbalanced Design
A balanced design is a situation where all sample sizes for all combinations of groups are equal. In an unbalanced design, the sample sizes for various groups are unequal. In two-way ANOVA, if the sample sizes of groups are too different, the normal approach of variance analysis is not adequate. For an unbalanced design, the regression approach is needed to be used instead. Another way is to make extensive efforts to ensure a balanced design.
A dataset, students.csv, contains 8239 rows of student particular data. Each row represents a unique student. It consists of 16 features related to the student and we will only focus on 3 features major, gender and salary [1].
Based on the two factor, major and gender, is there significant difference in average annual salary for graduates of different gender and major and also if there is any interaction between gender and major at 5% significance level?
Data Processing
From the dataset given, we need to filter out the students who graduated and perform a random sampling. In this case, it randomly sampled 40 students in each group i.e. different combinations of (major, and gender) to make it a balanced design. After that, select the dataset for the three variables of interest, the categorical variable major, gender
and the numeric variable salary
.

Hypothesis Testing
According to five steps process of hypothesis testing:
Set 1: H₀: μₐ₁= μₐ₂ = μₐ₃ = … = μₐ₆ H₁: Not all salary means are equal under different major
Set 2: H₀: μᵦ₁= μᵦ₂ H₁: Not all salary means are equal under different gender
Set 3: H₀: There is no interaction between major and gender H₁: There is an interaction between major and gender
α = 0.05 According to F test statistics:

We could also get the same result using statsmodels
package which uses the regression approach. Since the statsmodels
use regression approach, it is also suitable for unbalanced design i.e. you won’t need to make extensive efforts to ensure a balanced design.

Below shows the interaction plot of major and gender on salary:

Conclusion
For Set 1 & Set 2: Null hypothesis is rejected since F score > F critical or p-value is < 0.05. ∴We have enough evidence that not all average salaries are the same for graduates of different study subjects or gender, at 5% significance level.
For Set 3: Failed to reject the null hypothesis. ∴We do not have enough evidence that study subjects and gender has interaction, at 5% significance level. Moreover from interaction plot [4], it shows that there is no interaction, and both main effects, major and gender effects, are significant. For example, the average salaries of graduates will be significantly higher for males who graduated in Biology.
Recommended Reading
References
[1] "One-way ANOVA Hypothesis Test • SOGA • Department of Earth Sciences." [Online]. Available: https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/ANOVA/One-way-ANOVA-Hypothesis-Test/index.html
[2] Two-way analysis of variance – Wikipedia
[3] Kiernan, D. (2014). Chapter 6: Two-way Analysis of Variance. Open SUNY Textbooks.
[4] Chapter 7 ANOVA with Interaction | STA 265 Notes (Methods of Statistics and Data Science). (n.d.). Retrieved January 2, 2023, from http://campus.murraystate.edu/academic/faculty/cmecklin/STA265/_book/anova-with-interaction.html#the-interactive-two-way-anova-model