Demystifying Bayesian Models: Unveiling Explanability through SHAP Values

Exploring PyMC’s Insights with SHAP Framework via an Engaging Toy Example

Shuyang Xiang
Towards Data Science

--

The Gap between Bayesian Models and Explainability

SHAP values (SHapley Additive exPlanations) are a game-theory-based method used to increase the transparency and interpretability of machine learning models. However, this method, along with other machine learning explainability frameworks, has rarely been applied to Bayesian models, which provide a posterior distribution capturing uncertainty in parameter estimates instead of point estimates used by classical machine learning models.

While Bayesian models offer a flexible framework for incorporating prior knowledge, adjusting for data limitations, and making predictions, they are unfortunately difficult to interpret using SHAP. SHAP regards the model as a game and each feature as a player in that game, but the Bayesian model is not a game. It is rather an ensemble of games whose parameters come from the posterior distributions. How can we interpret a model when it is more than a game?

This article attempts to explain a Bayesian model using the SHAP framework through a toy example. The model is built on PyMC, a probabilistic programming library for Python that allows users to construct Bayesian models with a simple Python API and fit them using Markov chain Monte Carlo.

The main idea is to apply SHAP to an ensemble of deterministic models generated from a Bayesian network. For each feature, we would obtain one sample of the SHAP value from a generated deterministic model. The explainability would be given by the samples of all obtained SHAP values. We will illustrate this approach with a simple example.

All the implementations can be found in this notebook .

Bayesian modelization with PyMC

Dataset

Consider the following dataset created by the author, which contains 250 points: the variable y depends on x1 and x2, both of which vary between 0 and 5. The image below illustrates the dataset:

Image by author: Dataset

Let’s quickly explore the data using a pair plot. From this, we can observe the following:

  1. The variables x1 and x2 are not correlated.
  2. Both variables contribute to the output y to some extent. That is, a single variable is not enough to obtain y.
Image by author: pair plot of the data

Modelization with PyMC

Let’s build a Bayesian model with PyMC. Without going into the details that you can find in any statistical book, we’ll simply recall that the training process of Bayesian machine learning models involves updating the model’s parameters based on observed data and prior knowledge using Bayesian rules.

We define the model’s structure as follows:

https://cdn-images-1.medium.com/max/1600/1*lgIcHZ58UOLWt46g8elHtw.gif
Image by author: model structure

Defining the priors and likelihood above, we’ll use the PyMC standard sampling algorithm NUTS designed to automatically tune its parameters, such as the step size and the number of leapfrog steps, to achieve efficient exploration of the target distribution. It repeats a tree exploration to simulate the trajectory of the point in the parameter space and determine whether to accept or reject a sample. Such iteration stops either when the maximum number of iterations is reached or the level of convergence is achieved.

You can see in the code below that we set up the priors, define the likelihood, and then run the sampling algorithm using PyMC.

Let’s build a Bayesian model using PyMC. Bayesian machine learning model training involves updating the model’s parameters based on observed data and prior knowledge using Bayesian rules. We won’t go into detail here, as you can find it in any statistical book.

We can define the model’s structure as shown below:

https://cdn-images-1.medium.com/max/1600/1*lgIcHZ58UOLWt46g8elHtw.gif
Image by author: model structure

For the priors and likelihood defined above, we’ll use the PyMC standard sampling algorithm NUTS. This algorithm is designed to automatically tune its parameters, such as the step size and the number of leapfrog steps, to achieve efficient exploration of the target distribution. It repeats a tree exploration to simulate the trajectory of the point in the parameter space and determine whether to accept or reject a sample. The iteration stops either when the maximum number of iterations is reached or the level of convergence is achieved.

In the code below, we set up the priors, define the likelihood, and then run the sampling algorithm using PyMC.

with pm.Model() as model:

# Set priors.
intercept=pm.Uniform(name="intercept",lower=-10, upper=10)
x1_slope=pm.Uniform(name="x1_slope",lower=-5, upper=5)
x2_slope=pm.Uniform(name="x2_slope",lower=-5, upper=5)
interaction_slope=pm.Uniform(name="interaction_slope",lower=-5, upper=5)
sigma=pm.Uniform(name="sigma", lower=1, upper=5)

# Set likelhood.
likelihood = pm.Normal(name="y", mu=intercept + x1_slope*x1+x2_slope*x2+interaction_slope*x1*x2, \
sigma=sigma, observed=y)
# Configure sampler.
trace = pm.sample(5000, chains=5, tune=1000, target_accept=0.87, random_seed=SEED)

The trace plot below displays the posteriors of the parameters in the model.

Image by author: posterior of the model

Explain the model with SHAP

We now want to implement SHAP on the model described above. Note that for a given input (x1, x2), the model’s output y is a probability conditional on the parameters. Thus, we can obtain a deterministic model and corresponding SHAP values for all features by drawing one sample from the obtained posteriors. Alternatively, if we draw an ensemble of parameter samples, we will get an ensemble of deterministic models and, therefore, samples of SHAP values for all features.

The posteriors can be obtained using the following code, where we draw 200 samples per chain:

with model: 
idata = pm.sample_prior_predictive(samples=200, random_seed=SEED)
idata.extend(pm.sample(200, tune=2000, random_seed=SEED)here

Here is the table of the data variables from the posteriors:

Image by author: samples from posteriors

Next, we compute one pair of SHAP values for each drawn sample of model parameters. The code below loops over the parameters, defines one model for each parameter sample, and computes the SHAP values of x_test=(2,3) of interest.

background=np.hstack((x1.reshape((250,1)),x2.reshape((250,1))))
shap_values_list=[]
x_test=np.array([2,3]).reshape((-1,2))
for i in range(len(pos_intercept)):
model=SimpleModel(intercept=pos_intercept[i],
x1_slope=pos_x1_slope[i],
x2_slope=pos_x2_slope[i],
interaction_slope=pos_interaction_slope[i],
sigma=pos_sigma[i])
explainer = shap.Explainer(model.predict, background)
shap_values = explainer(x_test)
shap_values_list.append(shap_values.values)

The resulting ensemble of the two-dimensional SHAP values of the input is shown below:

Image by author: SHAP values samples

From the plot above, we can infer the following:

  1. The SHAP values of both dimensions form more or less a normal distribution.
  2. The first dimension has a positive contribution (-1.75 as median) to the model, while the second has a negative contribution (3.45 as median). However, the second dimension’s contribution has a bigger absolute value.

Conclusion

This article explores the use of SHAP values, a game-theory-based method for increasing transparency and interpretability of machine learning models, in Bayesian models. A toy example is used to demonstrate how SHAP can be applied to a Bayesian network.

Please note that SHAP is model-agnostic. Therefore, with changes to its implementation, it may be possible to apply SHAP directly to the Bayesian model itself in the future.

--

--