Model-Agnostic Local Explanations using Individual Conditional Expectation (ICE) Plots

How to Explain and Affect Individual Decisions with ICE Curves

Wai On

Published in

Towards Data Science

8 min readJul 15, 2020

Understanding Machine Learning Decisions

If your loan application is declined, you would probably want to know two things:

Why did I get rejected?
What can I do to get approved in the future?

In other words, the explanation you want is one that is specific to you. In particular, you would want to be educated about how your situation contributed to the outcome, and if possible, do something about it. For example, if your chance of getting a loan greatly increases once you have lived in your current address for over a year, then you can confidently apply again in the future!

This need for an explanation that is meaningful to the individual, or “local explanation” as it’s known, is applicable whether the decision is made by a human or by a Machine Learning algorithm. In the field of Interpretable Machine Learning, there are a number of well-known techniques for explaining individual decisions (e.g., SHAP and LIME). These techniques not only provide innovative ideas for teasing out the reasons behind individual predictions for “black box” Machine Learning models, they also help us understand the results by visualizing them in interesting ways (e.g., see my article on the SHAP Summary Plot).

A popular visualization technique that is not typically associated with local explanations that I think can contribute in this context is the Individual Conditional Expectation¹ (ICE) plot. Traditionally, ICE plots are primarily seen as visualizations for supporting global interpretation; that is, explaining what the model is doing at the population level. However, with some simple modifications in the way they are visualized, I think they can also be valuable at a local explanation level. The advantage of using ICE plots is that it is a simple and easy technique to understand. It also provides an abundance of information that the viewer can use to understand what is going on at the local level and to simulate what may happen if things change.

The goal of this article is to examine how ICE plots can help explain individual decisions and to help illustrate what is needed to affect them. I will first describe the ICE plot in detail (part 1) before using a modified version of it to explore its utility using a couple of examples (part 2).

This article is intended for anyone with a basic understanding of statistics and Machine Learning (ML), and is interested in how visualizations can help explain and affect individual ML decisions.

What is an ICE Plot?

An ICE plot is an extension of a Partial Dependence Plot² (PDP). PDPs visualize how changes in the value of a feature impact the prediction of a ML model. It does this by plotting the average predicted outcomes for different values of a feature you are interested in whilst holding the values of other feature values constant. This is useful because we can then see the relationship between the prediction and the features we are interested in (typically one or two at a time).

For example, we can answer questions such as:

How much inventory of ice cream is needed if the temperature goes up?
What are the average house prices in relation to the number of bedrooms?

The figure shown here is an example of a PDP showing the “RM” (average number of rooms per dwelling) feature from the popular Boston House Price dataset. The x-axis is the average number of bedrooms and the y-axis is the median value in $1000s. The plot is created using the scikit-learn PDP function.

ICE Plots Visualizes Individual Differences

PDPs are simple and easy to understand. However, their simplicity hides potentially interesting relationships between individual instances. For example, if the feature values of a subset of instances trend positive but another subset trend negative, then the averaging process may cancel them out.

ICE plots solve this problem. An ICE plot unpacks the curve that is the result of the aggregation process in PDP. Instead of averaging out the prediction, each ICE curve shows the predictions of varying the feature value for an instance. When they are presented together in a single plot, we can see the relationships between subsets of the instances as well as differences in how individual instances behave.

As shown in the figure below, although the majority of the instances follows the shape of the curve in the PDP shown earlier, there is a small subset at the top of the plot that behaves contrary to the PDP curve; instead of increasing between 6–7 on the x-axis, they actually decrease.

ICE Plot for the “RM” Feature in Boston House Price Dataset

Creating an ICE plot is straightforward. There are a number of packages available (e.g., in Python and R). Of course, it is also possible to create your own.

Using ICE Plots for Local Explanations

ICE plots are traditionally used to understand interactions and differences in data subsets as part of a Partial Dependence (PD) analysis. However, as mentioned earlier, since an ICE plot depict individual observations, there is a potential to use it to focus on a particular instance you are interested in.

Calculating the Values for an ICE Curve

In order to visualize an instance, we need to figure out how to calculate the values of the curve (or be able to locate the instance in a tool that creates ICE plots. E.g., see this tutorial on accessing ICE DataFrame in PyCEbox). Here is a simple example to illustrate the steps for calculating the values of an ICE curve:

1. Find the instance and the feature you are interested in.

2. Find the unique values of the feature.

3. For each of these values, create an instance with the other feature values. In other words, fix the other feature values and permute over the value of the feature of interest.

4. Make a prediction for each of the combinations.

5. Take the predicted values for each instance and plot the curve for the predictions.

Being Mindful of Correlations

A potential problem that needs to be dealt with when using ICE plots (and PDPs) is high correlations between features. This can be problematic in a number of ways:

We could end up with unlikely or impossible combinations of feature values that then get fed into the model. For example, in a dataset that includes the features “Pregnancy Status” and “Gender”, we could end up having a pregnant male being an input combination!
It’s difficult to attribute the effect to a single feature (i.e., as a result of “Collinearity” or “Multicollinearity”). In other words, it is hard to know how much of the predictive influence is due to one feature or to another feature.

A number of researchers have pointed out these issues in detail and have suggested various remedial approaches. In practice, there are some simple approaches to deal with highly correlated variables (e.g. aggregation, step-wise elimination). See the following article for detailed examples on how to deal with correlations.

Visualizing the ICE curve

For the purpose of explaining and affecting a prediction, there are a number of different ways to visualize an individual instance. Two main approaches are:

By itself. Visualize the individual instance on its own without any additional information about the rest of the population (as shown in the example above). The advantage of this approach is that the resulting plot is simple to understand. Viewers can simply focus on the instance they are trying to explain and affect.
In context. Visualize the individual instance in context of other instances in the dataset. The advantage of this approach is that viewers can examine the individual instance whilst at the same time referencing other instances; to see whether the instance is an outlier, how it conforms or deviates from other instances, etc. The disadvantage is that the plot could potentially be cluttered and overwhelming.

Research from Social Science suggests that people often look for explanations that are contrastive and contextual. So the approach of highlighting the instance of interest in context seems to be of potential benefit and worth exploring further.

However, most ICE plot tools available show all the instances indiscriminately. If we want to focus on a particular individual instance, we need to make some changes. Two changes in particular are important in this regard:

Contrast the instance of interest with other instances in the dataset. The visualization needs to show the instance of interest in context of other instances but in a way that distinguishes it from others.
Indicate different feature values to help what-if simulations of the outcome. It should highlight the current feature value and the varying values in the curve. This will help the viewer see the instance of interest in context and to mentally simulate possible desirable direction of change.

The modified ICE plot below shows an instance of interest in the context of a regular ICE/PD plot from the above example.

A few things to note about this ICE plot:

The thicker black line in the middle is the PD curve. That is, it is the average of all the instances.
All other lines are ICE curves. That is, they are instances where their values are varied for the feature of interest (i.e., feature B).
The blue line with markers is the instance of interest. Each round marker represents a varied feature value (as indicated in the x-axis) and a prediction (as indicated in the y-axis).
The yellow diamond shaped marker on the blue line shows the current feature value and prediction of the instance we are interested in.
The highlighted instance follows the general shape of the PD curve but at an elevated level of predicted probability.

By highlighting the instance of interest in the context of an ICE plot, we can see the effects of varying the values of a particular feature on the instance of interest in context of other instances in the dataset. However, it is important to note that this is a simple example for the purpose of illustrating a concept. It remains to be seen if the design decisions made here would cause unmanageable complexity in the interpretation of feature influences on individual decisions.

In part 2, we will use a couple of more realistic examples to illustrate the prior discussion and to further explore the utility of ICE curves in providing local explanations for a black box ML model.

References

[1] Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2014). “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” arXiv:1309.6392

[2] Friedman, J.H. (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics, 29, pp. 1189–1232.