What’s in the Black Box?

A distillation of ”Causal Interpretations of Black-Box Models” by Zhao and Hastie.

Published in

Towards Data Science

7 min readJun 5, 2019

You may be familiar with the authors of “Causal Interpretations of Black-Box Models” — Qingyuan Zhao and Trevor Hastie — particular Trevor Hastie. Perhaps this rings a bell?:

James, Gareth, et al. *An Introduction to Statistical Learning: with Applications in R*. Springer, 2017.

It certainly did for me, as this text was one of my first entry points to learning data science. You may also remember Hastie’s name from reading the docs of the glmnet package as you tried to figure out what the heck went wrong with your R code. Or that might have just been me.

It was because of this that I was curious to read Zhao and Hastie’s paper. The following is what I learned!

Black Box

Machine learning (ML) models, while having surpassed parametric models in predictive performance, have the downside of being considered to be black boxes by non-technical users; their relative opacity makes them hard to interpret. Consider the figure below:

The ‘nature’ of the model is ‘boxed’ up, out of view of its users and customers. That’s important because a core question is: What features are important to the output(s) of the model? Figuring that out is hard when you can’t easily ‘see’ what’s happening in your model.

Zhao and Hastie talk about three ideas related to feature importance :

We can treat the ML model as a function and ask which feature has the most impact on the output. Think beta coefficients of a regression model.
If we don’t have coefficients then perhaps we can measure the ‘importance’ of a feature by how much in ‘contributes’ to the model’s predictive accuracy.
The third is what Zhao and Hastie focus on, what they refer to as causality. They describe it as follows:

If we are able to make an intervention on Xj (change the value of Xj from a to b with the other variables fixed), how much will the value of Y change?

The stated goal of their paper then is to:

…explain when and how we can make causal interpretations after fitting black-box models.

Causality

Zhao and Hastie use the example of grades and study hours to demonstrate the concept of causality. Let’s consider a formula where:

Grade = [Some Constant Factor] 
        + (Some Multipicative Factor)x(Hours Studied) 
        + (Random Error)

This makes intuitive sense to us: generally, the more hours you study, the better grades you get. In other words, one might argue that studying more hours causes higher grades.

What about the other way around? What if all we had were the grades students received? With some manipulation of the above, we could arrive at something like:

Hours Studied = [Some Other Constant Factor] 
                + (Some Other Multipicative Factor)x(Grades) 
                + (Random Error)

Does this mean that if a teacher gives a student an ‘A’ instead of a ‘B’ the student will study more hours? Of course not!

For instance, in the grades formula above, perhaps we added another feature, hours worked (at a job outside of school). If we broadly assume that the working hours are necessary for financial support, then the working hours would likely have a causal effect on study hours, and not vice versa. Therefore, hours studied would be a causal descendent of hours worked.

Partial Dependence Plots

One tool that is useful for causal interpretation is the partial dependence plot (PDP). They are used to glean a feature’s marginal impact on the model’s predicted outcomes. To massively simplify, we can think of it as just plotting the average predicted outcome (vertical axis) for each potential value of a feature (horizontal axis). And for a PDP to be useful for causal inference, the variable in question can’t have any other variables between it and the target variable be a causal descendent. Otherwise, interactions with any descendants can cloud interpretation.

For example, the following PDP shows that average predicted bike rentals generally go up with higher temperatures.

I too don’t like to bike when it’s humid, or windy. https://christophm.github.io/interpretable-ml-book/pdp.html

Let’s dive into another example using that old chestnut, the Boston housing dataset. The dataset presents a target of median home value in Boston (MEDV) and several features, such as per capita crime rate by town (CRIM) and nitric oxides concentration (NOX, in parts per 10 million (pp10m)). It doesn’t take much domain expertise to inspect these and the rest of the available features to conclude that none could reasonably be a causal descendant of NOX. It’s more likely that NOX is affected by one or more of the other features — “the proportion of non-retail business acres per town” (INDUS), for one — than for NOX to affect INDUS. This assumption allows us to use a PDP for causal inference:

When nitric oxide concentration increased past 0.67 pp10m, Bostonians said NO to higher house prices. Groan.

Note that this plot centers on the vertical axis to mean=0. What we may infer from the above is that median home prices seem to be insensitive to NOX levels until around 0.67 pp10m, at which median home levels drop by about $2,000.

Individual Conditional Expectation plots

But what if we are unsure of the causal direction of our features? One tool that can help is an Individual Conditional Expectation (ICE) plot. Instead of plotting the average prediction based on the value of a feature, it plots a line for each observation across possible values of the feature. Let’s dig into ICE plots by revisiting our NOX example.

This ICE plot seems to support what we saw with the PDP: the individual curves all appear to be similar in shape and direction, and like before, drop down in ‘level’ around NOX = 0.67.

But we already theorized earlier that NOX is a causal descendent of one or more of the other features in the dataset, so the ICE plot only serves to confirm what the PDP showed.

What if we explored a different feature, “weighted distances to five Boston employment centers” (DIS)? One may argue that a feature such as CRIM, could be a causal descendant of DIS. If we look at an ICE plot for DIS:

We find a mix of patterns! At higher levels of MEDV, there is a downward trend as DIS increases. However! At lower levels of MEDV, we observe some curves showing DIS having a brief positive effect on MEDV, up to around DIS=2 or so, and then becoming a negative effect.

The takeaway is that the ICE plot helps us identify that this feature is likely indirectly affecting the target, due to interactions with one or more other features.

For another application of ICE plots, let us consider an example using the ubiquitous “auto mpg” dataset. The following plot shows that acceleration has some causal effect on MPG, but likely through interactions with other features.

Notice the difference in the behavior of the lines at the top (somewhat increasing in MPG), middle (decreasing), and lower third (increasing again) of the plot!

If we look at the other features in the dataset, we find one for origin, which is the geographical origin of the auto. This feature is arguably a causal ancestor of all features — you need to have a place to build the car before you can build it! (A gross oversimplification, I know, but still). Being as such, it will likely have many interactions involved with its causal relationship with MPG.

Can ICE plots still be useful here, even though the feature is ‘far upstream’? You bet! Let’s start with looking at a trusty boxplot:

American cars guzzle gas. At least in this dataset.

This plot shows a very noticeable difference in MPG between autos from these three regions. But does this tell the whole story?

Consider the following two ICE plots. The first shows [US (1) or Europe(0)] versus MPG:

… and the second shows [Japan (1) or Europe(0)] versus MPG

While at first, via the boxplot, it seemed that there is a significant difference in MPG attributable to origin, the ICE plots show that the pure impact might be a little smaller when considering interactions with other features: the slope for the majority of these lines are flatter than the boxplot would have us imagine.

Boxing It Up

When dealing with so-called black-box algorithms, we need to use smart methods to interpret the results. One point of view to do that is through inferring causality. Some tools can help with that:

Partial Dependence Plots (PDPs)
Individual Conditional Expectation Plots (ICE Plots)
Your or your team’s own domain knowledge! While we increasingly have many fancy tools to employ in the name of data science, it can’t replace hard-earned domain knowledge and well-honed critical thinking skills.

Thanks for reading!

Work files here.

Please feel free to reach out! | LinkedIn | GitHub

Sources:

Breiman. Statistical modeling: The two cultures. Statistical Science, 16(3):199– 231, 2001b.

James, Gareth, et al. An Introduction to Statistical Learning: with Applications in R. Springer, 2017.

Molnar. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book Accessed June 2019.

Pearl, 1993. Graphical models, causality and intervention. Statistical Science, 8(3):266–269.

Zhao & Hastie. 2018. Causal Interpretations of Black-Box Models. http://web.stanford.edu/~hastie/Papers/pdp_zhao_final.pdf