Model Interpretability

Explaining Machine Learning Models: Partial Dependence

Making black box models a thing of the past

Zito Relova
Towards Data Science
6 min readFeb 17, 2021

--

Source (Unsplash)

With all the complexity that comes with developing machine learning models, it comes as no surprise that some of these just don’t translate very well when being explained in plain English. The model inputs go in, the answers come out and no one knows how exactly the model arrived at this conclusion. This can result in some sort of disconnect or lack of transparency between different members working on the same team. As the prevalence of machine learning has increased in recent years, this lack of explainability when using complex models has grown even more. In this article, I’ll discuss a few ways to make your models more explainable to the average person whether they be your non-technical manager or just a curious friend.

Why is explainability important?

The responsibility that falls on machine learning models has only increased over time. They are responsible for everything from filtering spam in your email to deciding if you qualify for that new job or loan you’ve been looking for. When these models can’t be explained in plain English, a lack of trust ensues and people become reluctant to use your model for any important decisions. It would be a shame if the model you worked so hard to create ended up not being discarded because no one could understand what it was doing. In being able to explain a model and show insights that come from it, people (especially those with no background in data science) will be a lot more likely to trust and use the models that you create.

Interpreting Coefficients

On one end of the spectrum, we have simple models like linear regression. Models like this are quite simple to explain, with each coefficient representing how much a feature affects our target.

Simple linear regression with y = 2x

The image above shows the plot for a model represented by the equation y=2x. This just means that for an increase of 1 in feature x, the target variable will increase by 2. You can have multiple features like this; each one with its own coefficient representing its effect on the target.

On the other end, we have “black box” models like neural networks where all we can see are the inputs and outputs but the meanings and steps taken to get from input to output are effectively blocked by a sea of incomprehensible numbers.

Partial Dependence

Partial dependence shows how a particular feature affects a prediction. By making all other features constant, we want to find out how the feature in question influences our outcome. This is similar to interpreting coefficients explained in the previous section but partial dependence allows us to generalize this interpretation to models more sophisticated and complex than simple linear regression.

As an example, we’ll be using a decision tree on this Cardiovascular Disease dataset on Kaggle. The library we’ll be using to plot partial dependence is pdpbox. Let’s train the model and see how this all works.

Creating a partial dependency plot
Partial dependency plot for the feature age

The plot above shows the partial dependence plot for the feature age. The target variable we are trying to predict is the presence of cardiovascular disease. We can see that as the age feature goes above 19000 days (around 52 years), it starts to affect the prediction in a positive way (a higher age translates to a higher probability of cardiovascular disease). When thinking about this insight intuitively, the model makes sense and we are more likely to trust its predictions.

The decision tree we are using is still relatively simple and its partial dependency plot may not paint the whole picture. Let’s try again, this time with a random forest model.

Creating a partial dependency plot with a random forest classifier
Partial dependency plot using a random forest classifier

Using a more complex model like random forest, we see that the age feature affects our predictions more linearly as opposed to the ‘step-like’ prediction effect we saw when we used a simpler decision tree.

How does it work?

Partial dependence plots rely on a model that has been fit on the data we are working with. Let’s take a single row of our dataset as an example.

A single row of our dataset

Our age variable here has a value of 14501. The model will predict the probability of cardiovascular disease from this row of data. We will actually do this multiple times, changing the value of the variable age every time we make a prediction. What is the probability of cardiovascular disease when age is 12000? 16000? 20000? We keep track of these predictions and see how changing this variable affects the prediction. In the end, we do this for several rows and take the average prediction for different values of age. We then plot these out and come up with the partial dependency plot seen above.

A Step Further

Now that we’ve seen how partial dependence works with a single variable, let’s look at how it works with feature interactions! Let’s say we wanted to see how height and weight interacted to affect our predictions. We can also use a partial dependency plot to see this interaction. We’ll use the same random forest model seen in the previous section. By changing our code a little bit, we’ll be able to come up with an entirely different-looking plot that helps us see feature interactions.

Creating a partial dependency plot for feature interaction
Partial dependency plot for height and weight

This plot not only looks pretty, but it also gives us a lot of information about how height and weight interact to affect our predictions. The variable height has less of an effect since the color of the plot does not change much as we move across the x-axis. weight seems to have a much stronger effect on the probability of cardiovascular disease as the predictions are positively affected as we move up the y axis. Once again, thinking intuitively this makes sense. A person with a higher weight would be more likely to have cardiovascular disease. With this insight from our model, we are that much more inclined to trust its predictions. We can do this with any two features we want to be able to answer different hypotheses we might have about our data.

Conclusion

We saw the importance of being able to explain machine learning models to a non-technical audience. When a model is distilled down to easily understandable insights, people are more likely to trust its predictions and use them in the long run. This can be very helpful when trying to gain traction for a machine learning project that people may be skeptical of.

We saw simple models like linear regression where predictions can be interpreted using model coefficients. We were able to see the same insights in more complex models using partial dependency plots. We were even able to see how two features interact with one another to affect the outcome of a prediction using interaction plots. With this knowledge in mind, let’s remove the stigma that machine learning models are getting too complex for human beings to understand!

Thank you for reading!

You can connect with me through these channels:

--

--