Shall we build transparent models right away?

Published in

Towards Data Science

6 min readAug 20, 2019

Explainable AI (xAI) is the new cool kid on the block and the xAI approach (build a black box and then explain it) is now the most cherished modus-operandi of Machine Learning practitioners. Is this really the best route? Why don’t we build an interpretable model right away?

***Rashomon*** (羅生門 *Rashōmon*) is a 1950 Jidaigeki film directed by Akira Kurosawa. The film is known for a plot device that involves various characters providing subjective, alternative, self-serving, and contradictory versions of the same incident. (wikipedia)

Explainable vs Interpretable AI

Explainability and interpretability are two different concepts although, across different sources, the two seem to be erroneously used interchangeably. In this blog post, I will base my reasoning on the following definitions [7], which, at least from my viewpoint, seem to be the most widely adopted:

Explainable ML: using a black box and explaining it afterwards
Interpretable ML: using a model that is transparent, i.e. not a black box

In other words, based on these definitions, interpretability is a model property, while explainable ML refers to the tools and methodologies that aim at explaining black-box models.

Shall we build a black-box in the very first place?

The interview of Cynthia Rudin [1] is nice and refreshing, so is her article [2]. Among many others, Cynthia raises two interesting points:

1. Explainable ML methods provide unfaithful explanations

The current mainstream interpretation of Explainable AI / Ml does not make much sense: explainable ML methods provide explanations that are not faithful to what the original model computes . This is true by definition, even for local surrogates; on top of that, local surrogate methodologies currently available are unstable, i.e. not robust [3]

2. There is no trade-off between accuracy and interpretability

Forbes writes “More complicated, but also potentially more powerful algorithms such as neural networks, ensemble methods including random forests, and other similar algorithms sacrifice transparency and explainability [interpretability, according to the definitions above] for power, performance, and accuracy” [4]. Along the same lines, the DARPA [6]. Really?

Learning Performance vs Explainability (Interpretability according to the definitions above) [6]

No. There is no trade-off between accuracy and interpretability. A proof is offered by Rashomon sets: Consider that the data permits a large set of reasonably accurate predictive models to exist. Because this set of accurate models is large, it often contains at least one model that is interpretable. This model is both interpretable and accurate. [2]

Why don’t we build an interpretable model right away?

Back to our definition of interpretability

Right at the start of this blog post, we have introduced the concept of interpretable ML. According to our definitions, interpretable are those algorithms that are not black-boxes. What does this even mean?

If we look at Rashomon sets [2, 10]:

Rashomon sets: existence of a simple but accurate model [10]

Rashomon sets: classes of functions F2 that can be approximated with functions from classes F1 within δ using a specified norm [10]

In other words, interpretability is a property of classes of functions that mathematicians have tagged as transparent (the opposite of black-box) [9].

The fact that, for example, the class of linear models is more interpretable than Deep Neural Networks may seem totally uncontroversial. However, from the practical standpoint, linear models are not strictly more interpretable than deep neural networks, in particular for high dimensional models or in the presence of heavily engineered features. The weights of a linear model might seem intuitive, but they can be fragile with respect to feature selection and pre-processing [9].

In other words, nothing in the practitioners’ definition of interpretability suggests that a model belonging to an interpretable model class will be understood by the key decision maker (ex: a judge) or the key stakeholders (ex: a prisoner who applied for parole). Nothing in our definition of interpretability hints that a model will ultimately meet its requirements in terms of interpretability.

“But somewhere underneath there will be all sorts of computational irreducibility that we’ll never really be able to bring into the realm of human understanding” — Stephen Wolfram [8]

The many facets of interpretability

From a stakeholders perspective, having used an interpretable class of models, where interpretable means transparent (as opposed to black-box) does not bring much to the table. What should be the definition of interpretability then? Unfortunately, interpretability has many facets.

Different tasks might require different explanation needs. [12] provides a non-exhaustive list of hypotheses about what might make tasks similar in their explanation needs:

Global vs Local: Global interpretability implies knowing what patterns are present in general (such as key features governing galaxy formation), while local interpretability implies knowing the reasons for a specific decision (such as why a particular loan application was rejected). The former may be important for when scientific understanding or bias detection is the goal; the latter when one needs a justification for a specific decision.
Area, Severity of Incompleteness: What part of the problem formulation is incomplete, and how incomplete is it? On one end, one may have general curiosity about how autonomous cars make decisions. At the other, one may wish to check a specific list of scenarios (e.g., sets of sensor inputs that causes the car to drive off of the road by 10cm). In between, one might want to check a general property — safe urban driving — without an exhaustive list of scenarios and safety criteria. The severity of the incompleteness may also affect explanation needs.
Time Constraints. How long can the user afford to spend to understand the explanation?
Nature of User Expertise. How experienced is the user in the task?

In a nutshell, I don’t see any chance we will agree, anytime soon, on one, unique, silver-bullet definition of interpretability.

On the other hand, we have the Ockham’s razor and the Generalization Theory approach that aims at expressing the generalization properties of an algorithm as a function of a definition of model complexity [13]. Generalization Theory has a rigorous, agreed upon problem definition; and, yes, complexity will very likely negatively correlate with interpretability (whatever the definition). Why don’t we just then keep the Ockham’s razor as our guiding principle?

Bottom line

The current definition of interpretability (an interpretable model is not a black-box) does not bring any value to the stakeholders of a model
The Ockham’s razor principle should remain the guiding principle
Explainable ML methodologies (ex: LIME [11]) provide unfaithful explanations. They nevertheless remain a valuable tool for model understanding and debugging in the hands of Data Scientists

References

[0] xkcd, Curve-Fitting, https://xkcd.com/2048/

[1] https://twimlai.com/twiml-talk-290-the-problem-with-black-boxes-with-cynthia-rudin/

[2] Rudin, Cynthia. “Please stop explaining black box models for high stakes decisions.” arXiv preprint arXiv:1811.10154 (2018).

[3] Alvarez-Melis, David, and Tommi S. Jaakkola. “On the robustness of interpretability methods.” arXiv preprint arXiv:1806.08049 (2018).

[4] Understanding Explainable AI, Forbes, 2019 (here)

[5] William of Ockham (bio)

[6] XAI Program Update, DARPA, (link)

[7] Keith O’Rourke, Explainable ML versus Interpretable ML, (link)

[8] Stephen Wolfram, Logic, Explainability and the Future of Understanding, (link)

[9] Lipton, Zachary C. “The mythos of model interpretability.” arXiv preprint arXiv:1606.03490 (2016). (here)

[10] Semenova, Lesia, and Cynthia Rudin. “A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning.” arXiv preprint arXiv:1908.01755 (2019).

[11] Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should i trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016. (here)

[12] Doshi-Velez, Finale, and Been Kim. “Towards a rigorous science of interpretable machine learning.” arXiv preprint arXiv:1702.08608 (2017). (here)

[13] Mattia Ferrini, Generalization Bounds: rely on your Deep Learning models (here)