Interpreting machine learning models

Published in

Towards Data Science

8 min readFeb 20, 2018

Regardless of the end goal of your data science solutions, an end-user will always prefer solutions that are interpretable and understandable. Moreover, as a data scientist you will always benefit from the interpretability of your model to validate and improve your work. In this blog post I attempt to explain the importance of interpretability in machine learning and discuss some simple actions and frameworks that you can experiment with yourself.

Why is interpretability in machine learning important?

In traditional statistics, we construct and verify hypotheses by investigating the data at large. We build models to construct rules that we can incorporate into our mental models of processes. A marketing firm for example can build a model that correlates marketing campaign data to finance data in order to determine what constitutes an effective marketing campaign.
This is a top-down approach to data science, and interpretability is key as it is a cornerstone of the rules and processes that are defined. As correlation often does not equal causality, a solid model understanding is needed when it comes to making decisions and explaining them.

In a bottom-up approach to data science, we delegate parts of the business process to machine learning models. In addition, completely new business ideas are enabled by machine learning . Bottom-up data science typically corresponds to the automation of manual and laborious tasks. A manufacturing firm can for example put sensors on their machines and perform predictive maintenance. As a result, maintenance engineers can work more efficiently and don’t need to perform expensive periodic checks. Model interpretability is necessary to verify the that what the model is doing is in line with what you expect and it allows to create trust with the users and ease the transition from manual to automated processes.

In a top-down process, you iteratively construct and validate a set of hypotheses. In a bottom-up approach, you attempt to automate a process by solving a problem from the bottom-up.

As a data scientist you are often concerned with fine-tuning models to obtain optimal performance. Data science is often framed as: ‘given data X with labels y, find the model with minimal error’. While the ability to train performant models is a critical skill for a data scientist, it is important to be able to look at the bigger picture. Interpretability of data and machine learning models is one of those aspects that is critical in the practical ‘usefulness’ of a data science pipeline and it ensures that the model is aligned with the problem you want to solve. Although it is easy to lose yourself in experimenting with state-of-the-art techniques when building models, being able to properly interpret your findings is an essential part of the data science process.

Interpreting models is necessary to verify the usefulness of the model predictions.

Why is it essential to do an in-depth analysis of your models?

There are several reasons to focus on model interpretability as a data scientist. Although there is overlap between these, they capture the different motivations for interpretability:

Identify and mitigate bias.

Bias is potentially present in any dataset and it is up to the data scientist to identify and attempt to fix it. Datasets can be limited in size and they might not be representable for the full population, or the data capturing process might have not accounted for potential biases. Biases often only become apparent after thorough data analysis or when the relation between model predictions and the model input is analysed. If you want to learn more about the different types of types of bias exist, I highly recommend the video below. Note that there is no single solution to resolving bias, but a critical step towards interpretability being aware of potential bias.

Other examples of bias are the following:

e.g. word2vec vectors contain gender biases due to the inherent biases that are present in the corpora they have been trained on. When you would train a model with these word embeddings, a recruiter searching for "technical profiles" will leave female resumes at the bottom of the pile.

e.g. when you train an object detection model on a small, manually created dataset, it is often the case that the breadth of images is too limited. A wide variety of images of the objects in different environments, different lightning conditions and different angles is required in order to avoid an model that only fits to noisy and unimportant elements in the data.

Accounting for the context of the problem.

In most problems, you are working with a dataset that is only a rough representation of the problem you are trying to solve and a machine learning model can typically not capture the full complexity of the real-life task. An interpretable model helps you to understand and account for the factors that are (not) included in the model and account for the context of the problem when taking actions based on model predictions.

Improving generalisation and performance.

A high interpretability typically leads to a model that generalises better. Interpretability is not about understanding every single detail of the model for all of the data points. The combination of solid data, model and problem understanding is necessary to have a solution that performs better.

Ethical and legal reasons.

In industries like finance and healthcare it is essential to audit the decision process and ensure it is e.g. not discriminatory or violating any laws. With the rise of data and privacy protection regulation like GDPR, interpretability becomes even more essential. In addition, in medical applications or self-driving cars, a single incorrect prediction can have a significant impact and being able to ‘verify’ the model is critical. Therefore the system should be able to explain how it reached a given recommendation.

Interpreting your models

A common quote on model interpretability is that with an increase in model complexity, model interpretability goes down at least as fast. Feature importance is a basic (and often free) approach to interpreting your model. Even for black-box models such as deep learning, techniques exist to improve interpretability. Finally, the LIME framework will be discussed, which serves as a toolbox for model analysis.

Feature importance

Generalised Linear Models

Generalised Linear Models (GLM’s) are all based on the following principle:
if you take a linear combination of your features x with the model weights w, and feed the result through a squash function f, you can use it to predict a wide variety of response variables. Most common applications for GLM’s are regression (linear regression), classification (logistic regression) or modelling Poisson processes (Poisson regression). The weights that are obtained after training are a direct proxy of feature importance and they provide very concrete interpretation of the model internals.

e.g. when building a text classifier you can plot the most important features and verify whether the model is overfitting on noise. If the most important words do not correspond to your intuition (e.g. names or stopwords), it probably means that the model is fitting to noise in the dataset and it won’t perform well on new data.

An example of a neat visualisation for text interpretability purposes from TidyTextMining.

Random forest and SVM’s

Even non-linear models such as tree based models (e.g. Random Forest) also allow to obtain information on the feature importance. In Random Forest, feature importance comes for free when training a model, so it is a great way to verify initial hypotheses and identify ‘what’ the model is learning. The weights in kernel based approaches such as SVM’s are often not a very good proxy of feature importance. The advantage of kernel methods is that you are able to capture non-linear relations between variables by projecting the features into kernel space. On the other hand, just looking at the weights as feature importance does not do justice to the feature interaction.

By looking at the feature importance, you can identify what the model is learning. As a lot of importance in this model is put into time of the day, it might be worthwhile to incorporate additional time-based features. (Kaggle)

Deep learning

Deep learning models are notorious for their un-interpretability due to the shear number of parameters and the complex approach to extracting and combining features. As this class of models is able to obtain state-of-the-art performance on a lot of tasks, a lot of research is focused on linking model predictions to the inputs.

The amount of research on interpretable machine learning is growing rapidly (MIT).

Especially when moving towards even more complex systems that process text and image data, it becomes hard to interpret what the model is actually learning. The main focus in research is currently primarily on linking and correlating outputs or predictions back to the input data. While this is fairly easy in the context of linear model, it is still an unsolved problem for deep learning networks. The two main approaches are either gradient-based or attention-based.

-In gradient-based methods, the gradients of the target concept calculated in a backward pass are used to produce a map that highlights the important regions in the input for predicting the target concept. This is typically applied in the context of computer vision.

Grad-CAM, a gradient-based method is used in visual caption generation. Based on the output caption, the method determines which regions in the input image were important.

-Attention-based methods are typically used with sequential data (e.g. text data). In addition to the normal weights of the network, attention weights are trained that act as ‘input gates’. These attention weights determine how much each of the different elements in the final network output. Besides interpretability, attention within the context of the e.g. text-based question-answering also leads to better results as the network is able to ‘focus’ its attention.

In question answering with attention, it is possible to indicate which words in the text are most important to determine the answer on a question.

LIME

Lime is a more general framework that aims to make the predictions of ‘any’ machine learning model more interpretable.

In order to remain model-independent, LIME works by modifying the input to the model locally. So instead of trying to understand the entire model at the same time, a specific input instance is modified and the impact on the predictions are monitored. In the context of text classification, this means that some of the words are e.g. replaced, to determine which elements of the input impact the predictions.

If you have any questions on interpretability in machine learning, I’ll be happy to read them in the comments. Follow me on Medium or Twitter if you want to receive updates on my blog posts!

Interpreting machine learning models

Why is interpretability in machine learning important?

Why is it essential to do an in-depth analysis of your models?

Interpreting your models

Written by Lars Hulstaert