The exponential growth of Machine Learning (ML) applications and the embedding of models in many production applications drive the need to explain these models’ explainability and transparency. In this post I present a straightforward method to provide a case-level explanation for model scores – namely: which input features contributed to increasing or decreasing the predictive score for a specific case, or record?

Introduction
In Data Science, we deal with a broad toolbox of statistical and machine learning models, from the simplest regression to the most complex Deep Learning neural network architectures. Furthermore, models with hundreds of input features are not atypical in today’s data-rich world, contributing to the overall complexity of machinery that typically produces a single output signal – namely an indication of whether something is true or false, or a prediction of continuous measurement.
In many situations, that final prediction can drive a critical action, or inform a human decision. For example, I worked extensively in the field of ML applications in the Tax Compliance space, using models to automatically score millions of individual tax returns to identify suspicious ones for further inspection. In such cases, it is not enough to have a model-level explanation of feature importance (available in many open source packages); instead, we need some case-level explanation for the model prediction. Specifically, if the probability of a given output class is high (or low), we need to provide a shortlist of features that we believe are causing the model to make that particular determination. This could help the user gain confidence in the prediction and help drive further decisions.
The more places ML models are being used, the greater the need for explainability. For example, most in most fraud detection applications being able to support decision for individual transaction rejection is quite important. The same obviously applies to most types of credit scoring, where important lending decisions are being made on the basis of an overall model score. Often score explanation is critical for end users to gain confidence in a model and drive adoption. This is particularly true when ML model are being introduced in an existing processes that has historically relied upon expert knowledge and intuition.
Not every ML model needs explainability. When models are driving heavily automated decisioning, such as process control, direct marketing, content recommendations, etc. case level explanation of model output might be more beneficial during tuning and debugging of an application, and less for operations.
In general, when scores affect individual cases an explanation is useful-to-critical. When model scores only affect a mass process, individual explanations are only pertinent for people with heuristic or historic expertise in the process.
These are only a few examples of the general problem of interpretability of ML models, as stated in this paper by Doshi-Velez and Kim, which provides a general framework and definition of the problem.
Explainable Models?
Across the spectrum of model types available in Data Science, some are considered more suitable for "explanation" than others – but can we really say that? For example, in classification, the most basic framework I can think of is probably a Logistic Regression model. Here, aside from a non-linear monotonic transformation at the end, the prediction is essentially computed from a linear function of the inputs.
Therefore, one could quantify the contribution to the final score for each of the features with the product of the coefficient for that feature times the value of the feature itself. Simple enough? Indeed, in most cases, this will work, especially if some assumptions are true -namely that the features are independent (no colinearity) and standardized, as the bias term may complicate the interpretation of the model. When a model has many features, the complete absence of colinearity is not a likely situation, and therefore even the interpretability of the most simple model available becomes problematic.
Decision trees, at least in their simplest forms, have also been considered "interpretable" because they can infer well-defined decision rules from the data. Again, this is true in theory, especially if we keep the tree’s depth under control, thus limiting the number of preconditions in each rule. But in practice, to achieve a reasonable accuracy for the model, decision trees typically yield a large set of "deep" rules. In addition to that, the fact that rules are in a human-readable form does not always imply that these rules are also interpretable.
While, in general, simpler types of models and models with fewer variables can be somewhat more interpretable, this accuracy/interpretability trade-off may not be one that is easy to negotiate in a given application, or at least not without some effort. For example, I have often found myself having to "reduce" complex models, purely for the sake of interpretability. While performing model reduction is important to gain in generalization and limit overfitting, paring down a model to make it more interpretable is not always easy, or possible. Even taming a "simple" logistic model to cut down variables, reduce colinearity, and make the coefficient more interpretable while preserving its accuracy can be a lot of work.
Case-Level Model Explanation Techniques
Before I dare present this highly simplified method to provide a case-level explanation of model output, let us review some more sophisticated techniques that have been presented in the literature.
As a general background, I highly recommend this white paper by Google on AI Explainability, which provides a good overview of the topic and discusses the solutions implemented in Google Cloud (full disclosure: I work at Google, but this is not to advertise our services, I simply think its a pretty good writeup). Specifically, the term used in this white paper to describe the ability to "explain" an ML model score at the instance-level is Feature Attribution (FA). FA is a signed, continuous value representing the degree of contribution of any individual input feature to the model’s output for the particular instance of the score.
The technique used by Google for FA is based on game theory concepts first introduced by Lloyd Shapley, known as Shapley Values (SV). The excellent online book by Christopher Molnar on Interpretable Machine Learning has a section dedicated to Shapley Values, which provides an excellent overview of this method. The SV method has several advantages:
- It is model-independent: it can be applied to any "black-box" model as it only requires the ability to score examples drawn from a dataset.
- Has a clear interpretation: the SV of each input feature, for a given instance, represents the portion of the difference between the instance score and the average score across the population, which is explained by that feature assuming its particular value.
- Completeness: The sum of all the SVs across all the features is guaranteed to match the score’s marginal difference to its baseline.
The drawback of this methodology is that the computational complexity involved to compute the Shapley Values grows exponentially with the number of features, making it impossible to apply to real-world problems in its original form. However, some methods can approximate the SV calculation through sampling.
Another popular method for model interpretation is LIME, which first appeared in 2016. At a high level, LIME works by locally approximating any black-box model with an interpretable model (e.g., a regression). This is done by training the approximation model with data generated from the black-box model by perturbing its inputs from the values of the instance being explained. LIME is also a model-independent approach, but it takes a completely different slant at the problem than the SV method. See this post on Medium for a comparison between LIME and SHAP (a method based on Shapley’s principles).
Note that the methods mentioned above are far more sophisticated than those suggested in this post and ultimately more theoretically sound. However, our method is quite straightforward to implement and has worked well in practice. It may also be more comprehensible to the humans charged with taking actions based on scores. That said, I will make sure to highlight those situations where it might not work well.
A Simple Model-Independent Approach
Given a model M, let us assume that we can score dataset X containing a sample of records containing all the input features required by M. This dataset could be the training data itself, on which the model M was estimated, or a hold-out set, or a completely different sample. The only requirement is that X provides a reasonable representation of the input space over which M will be operating in production.
Next, we apply M to the dataset X, producing a vector of scores Y. Note that we do not need the actual true value of the target, nor we need to know any detail about M – we need the scores produced by M given the input dataset X.
For each input feature J, we do the following:
- For categorical features: compute the average of Y for each distinct value of J. We can also compute the standard deviation, as it provides additional useful information.
- For continuous/numerical features: apply any reasonable binning method (e.g., equi-frequency binning) to first discretize the variable J. Then, compute the same score statistics as described for categorical variables.
We then compute a ratio, Zj, between the mean of Y for each value of J and the overall population average of Y. This ratio represents the average change in the score when input J assumes a particular value Ji, across the whole dataset. We can also calculate the ratio between the standard deviation of the score when J = Ji and Y’s overall standard deviation across the population. Let’s call this value Vj.
The output of the process above can be seen as a simple lookup table. As an example, here we fitted a simple logistic regression model (we could have used any other class of models for that matter) to the well known UCI Census Income dataset. Then, we calculated the Zj and Vj values for each of the inputs of the model. A sample of the results is illustrated below, where for space limitations I am showing the calculated values for only 4 of the 11 model inputs. Note that for continuous inputs such as Age and Education-Num the discretization was added only for the purpose of this evaluation method – those inputs were not discretize in order to create the model. As a matter of fact, this method is completely independent of how inputs are specifically transformed "inside" the model, as it only depends on the scores generated by the model when presented with a given input vector. The color-coding helps identify those input/value combinations that contribute the most toward increasing (green) or decreasing (red) the model score.

Given an instance I, we can retrieve a corresponding set of Zj and Vj values based on each input J’s corresponding value. The first set of numbers, Zj, represents the average effect of input J assuming value Ji on the score; values above 1.0 indicate that J = Ji increases the score (positive contribution), while values below 1.0 indicate that it contributes to lowering the score instead (negative contribution). The relative scale of the Zj values provides us with an indication of the difference in contribution across the J inputs, for instance I.
Thus, given an instance I, we can retrieve the Zj scores and use them to sort the input features based on their value. Again, factors greater than 1.0 represent a positive contribution, while those lower than 1.0 contribute negatively. A Zj value near 1.0 represents a neutral contribution. The example below shows the ranked effects of the model inputs for a particular record of the chosen dataset, for which the model produces a very high score.

In this first example, the input relationship is ranked highest, as it contributes the most toward achieving an above-average score, with occupation and marital-status closely following in rank. While the input fnlwgt and native-country are at the bottom of the list, note that the Zj scores are close to 1.0, meaning that these inputs are playing a neutral contribution to this record score, not a negative one.
The next example is for a case for which the model produces a relatively low probability score, 0.18. Notice that the ranking of the inputs has completely changed in this case. The highest contributing features, relatively to this example, are education-num and hours-per-week; however, in this case, the highest contributing factors are barely above 1.0, therefore close to a "neutral" contribution. On the other hand, the diagram explains that the main drivers of the low score, in this case, are the occupation and workclass, which have a Zj score around 0.6, pushing down the overall model score.

The Vj values provide additional information regarding the effect of each input, for the instance I. Specifically, it informs us about the variability of the contribution to the score, and therefore the reliability of the information represented by Zj. This is also addressing the main weakness and limitation of this approach – let us assume, in fact, that the distribution of the score Y when J = Ji is actually bimodal.

In this case, the average value of the score Y will not properly capture the typical effect of the feature J assuming value Ji on the model output. Indeed, a situation like the one depicted here would imply that the model is actually capturing some highly non-linear relationship between the input feature J and one or more other input features. This is definitively a possible, albeit not an extremely common situation. The proposed method would completely fail to represent the fact that the input feature contribution is either strongly negative or strongly positive, depending on other input features’ value. This is what, instead, a much more computationally intense approach like the SV or the local estimation implemented in LIME can definitively address.
However, by measuring the variability of the score Y when J assumes value Ji, and comparing it with the score’s general variability, we can at the very least be aware of such situations. If score variability is relatively high, like in the example of a bimodal distribution, the Vj value will increase, indicating high variance in the effect, compared to a feature with a comparable Zj value but with a lower Vj value. Thus, for example, we can use the Vj value to color code the bar chart based on the Zj values to capture the direction, intensity, and consistency of the feature attribution.
The image below provides an example of such a visual representation of the effect of the model inputs on the score and the consistency, or variability, or the direction of the contribution.

Conclusion
Among the many trade-offs that we learn to deal with in Data Science is Accuracy versus Complexity. As discussed in the brief overview of established methods for Feature Attribution, sophisticated methods exist that can provide a more accurate "explanation" of a model score, at the individual instance level, including Shapley Values and LIME. While these methods are certainly going to provide more precise results with more complex and non-linear models, the model-independent method suggested here is very simple to implement and can provide a quick solution to add "explanation factors" to a model score report.
I would like to thank Paul McQuesten for his support in the preparation of the post and in particular for the help in producing the examples presented.