: An overview
Introduction
When features interact with each other in a prediction model, the prediction cannot be expressed as the sum of the feature effects, because the effect of one feature depends on the value of the other feature. -Christoph Molnar
The complex collaborative effects of features towards prediction of a variable is called feature interaction. Another aspect of feature interaction is the variation of one feature with respect to another with which it is interacting. These variables are often referred to as interaction variables.
Commonly, we encounter pairwise feature interactions in datasets where features interact in groups of 2. For example, the risk of developing a heart disease would depend on your BMI which is defined as weight / height². Here, { weight, height } is a pair-wise interaction. The less common higher-order feature interactions, where we see more than 2 features interacting, are common in the sciences where they have such complex relationships. For example, {x₁, x₂, x₃, x₄} is a 4th order interaction in log(x₁² + x₂ + x₃*x₄²).
Identifying the feature interactions present in your dataset can be useful for various reasons including:
- Understanding the relationships between features in your dataset and its effect on the prediction and avoid biases from interpreting models with only the main effects and not the interactive effects
- Using the information about interactions to explicitly build expressive models
- Engineering features to improve model performance
Analyzing feature interactions
A popular approach used to visualize the interaction effects is the partial dependence plot (PDP). We can use PDP to visualize how 2 features vary with the predictions. Variants like ICE and shapley also help you visualize the interaction effects in a similar manner. Below we see an example (using pdpbox) with the California housing dataset where as the median income and the average occupant of the house increase, the price of housing increases.

While these methods are great in interpreting the interaction effect, they’re quite tedious tools in automatically identifying all feature interactions in the data especially if you have a large feature space. One simple approach we can use is to apply 2-way ANOVA method on all combination of features and filter interactions based on the F-Statistics. But, they are not very efficient in capturing complex relationships. However, the Friedman’s H-statistic uses the partial dependence discussed earlier to identify the strength of interactions of given order of features in the dataset. The plot below shows the H-statistics (calculated using sklearn_gbmi) of the top 10 interactions in the california housing dataset:

While these methods work great for pairwise interactions, they are hard to use to identify higher order interactions. One interesting approach that I came across was neural interaction detection. The approach analyses the weights of feed-forward neural networks to identify interactions of arbitrary order. From their examples, given the following synthetic data
X1, X2, X3, X4, X5, X6, X7, X8, X9, X10 = X.transpose()
interaction1 = np.exp(np.abs(X1-X2))
interaction2 = np.abs(X2*X3)
interaction3 = -1*(X3**2)**np.abs(X4)
interaction4 = (X1*X4)**2
interaction5 = np.log(X4**2 + X5**2 + X7**2 + X8**2)
main_effects = X9 + 1/(1 + X10**2)
Y = interaction1 + interaction2 + interaction3 + interaction4 + interaction5 + main_effects
ground_truth = [ {1,2}, {2,3}, {3,4}, {1,4}, {4,5,7,8} ]
the method detects the following interactions
Pairwise interactions Arbitrary-order interactions
(1, 2) 7.8430 (1, 2) 6.8951
(4, 8) 3.1959 (2, 3) 2.0953
(5, 8) 3.0521 (7, 8) 1.7971
(7, 8) 3.0290 (4, 5, 8) 1.6026
(4, 5) 2.8506 (1, 4) 1.5912
(2, 3) 2.6294 (5, 7) 1.5261
(1, 4) 2.5037 (3, 4) 1.3500
(5, 7) 2.4460 (4, 7) 1.0580
(4, 7) 2.2369 (4, 7, 8) 0.7727
(3, 4) 1.8870 (4, 5, 7, 8) 0.5467
While the algorithm is not perfect, it is certainly informative.
However, these methods still do not give us any information on the nature of the interactions. Identifying the nature of interactions can be particularly useful if you’re trying to engineer features that help your model performance. If we have initial guesses about the relationships, we can use some simple heuristics to select the interactions. For example, the pycaret library takes the products (and optionally ratios) of all possible pairs of features and identifies the top important interactions using methods like random forest, lightgbm or correlation. For the california housing dataset, pycaret selects the following interactions for an interaction threshold of 5%:
['Longitude_multiply_MedInc', 'Latitude_multiply_AveOccup',
'AveBedrms_multiply_MedInc', 'MedInc_multiply_Longitude',
'HouseAge_multiply_MedInc', 'Longitude_multiply_Latitude',
'Latitude_multiply_MedInc']
A more complicated approach would be to use genetic-algorithm based symbolic regression methods. You can even use custom function sets with these algorithms. For the same dataset, gplearn’s symbolic regressor generates the following relationship:

The randomness associated with the approach and the complexity of the generated expressions are its major drawbacks. Similar methods like FEAT or RuleFit can also be used. These methods tend to produce expressions which are more interpretable.
Conclusion
Identifying and understanding feature interactions can be crucial in developing and interpreting your models. However, all the automated methods introduced above for identifying feature interactions are far from perfect and should be used with care and rigorous analysis. There are also plenty of other tools out there that directly or indirectly help detect feature interactions.