The world’s leading publication for data science, AI, and ML professionals.

Explainable AI, the key to open “black boxes”

Omnipresence of AI and need for explainability

As you may have noticed, machine learning algorithms have become ubiquitous in our daily lives. They manage the content we see, suggest the ideal route to the office, decide if we are good candidates for a position at a company, etc. This pervasiveness continues to grow as they will soon govern a wider spectrum of essential activities including education, work, transport and medicine.

Of course, these algorithms are built and supervised (in most cases) by humans. But the models currently used, that is essentially deep neural networks, are of such complexity [1] that it is practically impossible for their creator to understand their inner workings, giving birth to the term "black box". Being able to explain their decisions confers multiple benefits, as we will see through the following example.

Neural Network: inspired by the human brain, artificial neurons initially receive a value which they transmit to the neurons they are connected to in the next layer. Each neuron thus receives a collection of values that it aggregates according to the strength of each neural connection. It then applies a non-linear function and transmits the value obtained to the next layer; and so on. The ultimate values correspond to the desired predictions - for example the probability that the image shows a dog.
Neural Network: inspired by the human brain, artificial neurons initially receive a value which they transmit to the neurons they are connected to in the next layer. Each neuron thus receives a collection of values that it aggregates according to the strength of each neural connection. It then applies a non-linear function and transmits the value obtained to the next layer; and so on. The ultimate values correspond to the desired predictions – for example the probability that the image shows a dog.

Imagine taking out a bank loan. Your application file is potentially studied, at least as a first screening, by a machine learning model. The latter uses the data you provided beforehand (age, gender, salary, rent, etc.) as well as external data (parents’ profession, bills paid on time, etc.) in order to decide whether it is relevant to grant you this loan, or on the contrary, if it should be refused. In this situation, being able to understand the decisions of the algorithm first allows its creator (e.g. data scientist) to verify that it works correctly and to debug it if necessary. Indeed, it is crucial that the algorithm makes the right decisions for the right reasons, in all cases. In particular, it is crucial to ensure that there is no bias, because the algorithm may obtain excellent results but using spurious or discriminatory correlations. For example, if in the training data, people of foreign origin receive fewer loans than the average, the algorithm may have learned to decrease the probability of granting a loan if the person is of foreign origin, which is neither logical nor desirable [2]. Beyond its design, understanding the decisions of the model makes it possible to facilitate its acceptance and use by businesses; because they can thus validate that its actions are in line with the guiding ideas of the company. Finally, understanding the model also allows informative feedback to the customer, which has real added value. For example, the candidate would be informed that with a higher salary of 5k / year and bills paid on time for the next 6 months, the loan would be granted.

I take this opportunity to recommend the book "Weapons of Maths Destruction" by Cathy O’Neil, which shows through multiple everyday situations how AI can reinforce inequalities in society and discriminate against certain groups of people.

The benefits of an Explainable Ai obviously do not stop with this specific use case. Whether we are looking at self-driving cars or the detection of cancer by an AI-assisted doctor, we need complete confidence in these models to distribute them on a large scale while ensuring performance, safety, reliability and equity. Only explicability allows this.

In short, an explainable AI makes it possible to improve the behaviour of the algorithm in certain situations, to avoid bias and discrimination, to increase its adoption within companies by generating confidence; to provide a constructive personalised feedback to users.

The importance of this subject is such that European legislation takes a keen interest in it. By means of the General Data Protection Regulation (GDPR), the European Union made the data-processing actors accountable by excluding the possibility that certain critical algorithmic decisions are taken without the supervision of a human – therefore implicitly imposing explainability.

But, what is explainability ?

From the start, we have been talking about explainability of machine learning models. But in practice, what does this concept mean? How do you make an algorithm explainable ?

First of all, you should know that there are two categories of machine learning models:

  • Models qualified as intrinsically interpretable by virtue of their trivial inner workings – where an additional explanation is not necessary. For example, linear regressions or decision trees, for which it is easy for a human to understand the behaviour of the model. Unfortunately, for many applications, they will not perform well enough given their simplicity and will therefore remain unused.
  • Complex models, led by neural networks, are generally much more efficient, but their predictions are very difficult if not impossible to explain. For the latter, we apply so-called "post-hoc explainability methods", which occur after training and prediction phases of the model, and describe its functioning. This concept is very similar to what we, humans, do. We are not really easy-to-interpret individuals, and operate in a complex manner. However, we are able to explain our decisions after the fact, through various means.

As you can imagine, we are especially interested in the second category, because it it allows us to obtain better results while preserving a certain transparency of algorithmic decision-making, thus limiting the famous trade-off between explainability and performance.

Figure briefly illustrating the trade-off between explainability and performance, generated by the choice of the machine learning model. Post-hoc explainability methods make it possible to obtain the performance of neural networks (NN) with the explainability of a linear regression. Source: [3]
Figure briefly illustrating the trade-off between explainability and performance, generated by the choice of the machine learning model. Post-hoc explainability methods make it possible to obtain the performance of neural networks (NN) with the explainability of a linear regression. Source: [3]

Post-hoc methods can be global (general functioning of the model, for all instances) or local (functioning of the model for a single prediction). They are applicable to any type of automatic learning algorithm (model-agnostic) or specific to a precise architecture (model-specific).

The explanation produced can take various forms as long as it faithfully describes the functioning of the model while being understandable by a human. Among others, it may be [3]:

  • a measure of importance for each variable in the model
  • a list of data points (the most influential, the best represented …)
  • a textual explanation
  • a visualisation
  • an interpretable model that locally approximates the complex model

Some post-hoc explainability methods

There is a wide variety of methods leading to the various forms of explanation stated above. You will find 3 examples below, which I hope will give you a more precise idea of how explainability methods can work.

Saliency maps is a family of methods using mathematical operations on the internal parameters of the model in order to explain the functioning of neural networks. To name but a few: Sensitivity Analysis, DeepLIFT, Grad-CAM or GuidedBackpropagation. All of them back-propagate the gradients in the neural network in order to estimate the influence of each input variable on the model prediction. The explanation is therefore an importance score for each variable, or what is called a saliency map (see image). The difference between these methods materialises in subtle variations in the back-propagation process.

Source: https://bdtechtalks.com/rise-explainable-ai-example-saliency-map/
Source: https://bdtechtalks.com/rise-explainable-ai-example-saliency-map/

LIME [4] creates a new dataset around the instance it wishes to explain. For an image, it randomly blackouts several super-pixels, and designs N new images as such. Then, it builds an interpretable model (i.e. linear regression or decision tree) on this set of images. This model is easily understandable and is used as an explanation of the prediction for the starting image. In other words, LIME’s objective is to perturb the explained instance and to study the effect of these perturbations on model predictions using a simple surrogate model. Variables causing a large change in model predictions are considered important.

Note: for linear regression, the coefficient corresponding to each super-pixel is indicative of its importance for the classification of this image.

Source: Locally Interpretable Model-agnostic Explanation Paper, Riberio
Source: Locally Interpretable Model-agnostic Explanation Paper, Riberio

SHAP [5] builds on the Shapley Value [6], which emanates from Game Theory, and describes how to distribute the winnings of a game "fairly" among the players, knowing that they all collaborated. For example, suppose a team of 3 players wins 100€ during a competition, and that we want to distribute this gain between them according to their respective contribution to the project. Indeed, as in all group work, there are often people who commit more than the others, or who have unique skills that help the group reach another level; so whose contribution is higher. This method thus calculates the added value of each player to the obtained payout when added to any possible coalition of players. In this specific case, we calculate the gain of (J1 and J2 vs J1), (J3 and J2 vs J3), (J2 vs no one) and (J1 and J3 and J2 vs J1 and J3). We take a weighted average to find the ‘fair’ contribution of J2 (Player2) to the total gain reported by the team.

To explain machine learning models, we extend the Shapley Value by considering the prediction explained as the gain of a game where each variable is a player of this game. So for the prediction of a model f on an instance x, denoted f(x), the marginal contribution of a variable j to a coalition of variables S (subset of variables) is denoted: _val(S U j)-val(S)= E[f(X)|X_s=x_s, X_j=x_j] – E[f(X)|X_s=xs]. The formula for the Shapley Value of variable j is the following (knowing there are F features):

The notion of equity is defined by 4 Axioms and the Shapley Value is the only solution satisfying them. In practice, the above sum is impossible to calculate and we resort to approximations such as SHAP, which leverages a linear regression to estimate the Shapley Value proper to each variable, before posing them as explanations.

Example of output explanation for SHAP. Source: [5]
Example of output explanation for SHAP. Source: [5]

These methods must adapt to the different modalities of data that they receive, namely: text, image, tabular or graph.

Providing good explanations

Now that we have seen why explainability is necessary and what it consists of, it remains for us to approach how to evaluate these methods, so as to propose a good explanation of the model.

It is generally quite difficult to define what a good explanation is from a theoretical point of view. Some researchers have looked into the question and have defined a list of desirable properties [7].

  • Precision & Fidelity: the explanation is relevant and perfectly matches what the "black box" model predicts.
  • Robustness: slight variations in the characteristics of an instance or in the operation of the model do not substantially change the explanation.
  • Certainty: The explanation reflects the certainty of the machine learning model. In other words, the explanation indicates the confidence of the model for the prediction of the explained instance.
  • Meaning: The explanation reflects the importance of each variable.
  • Representativeness: the explanation covers many cases or instances, and therefore the more general functioning of the model.
  • Comprehensibility: the explanation is easy for a human to understand. This property includes several aspects which are discussed in more detail below.

Indeed, from the perspective of human understanding: a good explanation is often:

  • Selective: People don’t expect explanations to cover all of the causes of an event but rather to be given two or three key factors. Example: "France football team beat Germany 1–0 after a balanced game because they were more clinical in both penalty boxes".

  • Contrastive: Often people don’t ask why a prediction was made, but rather why this one instead of another. Example: Why did he get a loan and not me?

  • Social: the explanation is part of an interaction between the explainer and the recipient, so it must be adapted to the audience. Formulated differently, an explanation intended for a data scientist must be different from an explanation intended for a customer of a bank who has applied for a loan.

Future prospects

Although there already are several reliable and easy-to-use explainability methods, research in this area is still emerging. Many improvements are therefore to be expected in the coming years, in particular due to the growing popularity of this field. Among other things, it would be desirable to insist more on the properties of the explanations provided, to develop advanced theories backing existing methods as well as to improve the evaluation processes of these methods so as to guarantee their reliability in various scenarios. The ultimate goal is to provide explanations that are reliable, scalable and easy to understand for any audience.

This article was originally written for "AI for Tomorrow" and is also available here (in french).

Références

[1] 3Blue1Brown – But what is a neural network? | Chapter 1, Deep learning – https://www.youtube.com/watch?v=aircAruvnKk

[2] Cathy ONeil, Weapons of Maths Destructions. 2016

[3] Duval, A. (2019). Explainable Artificial Intelligence (XAI). MA4K9 Scholarly Report, Mathematics Institute, The University of Warwick.

[4] Ribeiro, Marco Tulio, Singh, Sameer, and Guestrin, Carlos. why should i trust you? Explaining the predictions of any classifier. In KDD, 2016.

[5] Lundberg, S., & Lee, S. I. (2017). A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874.

[6] Shapley, Lloyd S. A Value for N-Person Games. Contributions to the Theory of Games 2 (28): pp 307–317. 1953.

[7] C. Molnar. Interpretable Machine Learning. 2018

Photo by Arnold Francisca on Unsplash
Photo by Arnold Francisca on Unsplash

Related Articles