The world’s leading publication for data science, AI, and ML professionals.

Using SHAP for explainability – Understand these Limitations First!!

Explainability done right.

Photo by Drew Beamer on Unsplash
Photo by Drew Beamer on Unsplash

I thoroughly enjoyed the tenure of my MBA degree; I loved each and every subject to the core and was able to connect the dots around how knowledge of each of the subject will help me become a better manager having a strong holistic view. However, there was one subject for which I never bothered to enter the classroom – that was business ethics. I was strongly opinionated that ethics can’t be taught. Every person has a varying degree of it which is a function of his/her value system. For me, maybe stealing a pen is ok but stealing a car is not. For someone else it could be different.

Finally, the common definition is too good to be further expanded – "Doing the right thing."

However, this "doing the right thing" works well only for humans because the moral values are universally defined. For machines, the concept of value system fails. So, until we reach the debatable stage of "singularity", defining the ethics for machines is human responsibility.

Whether it is the racial bias of a model, twitter saliency issue ([link](https://onezero.medium.com/is-a-i-the-antichrist-a2fd6d853610)) or AI being antichrist (link) – there are numerous examples around issues with model bias and fairness. Currently this is the biggest hindrance in the growth of AI/ML.

There has been a tremendous growth in Explainable Ai in the last 5 years. Making complex models explainable and trying to remove any bias and induce fairness in the model has been one of the key goals of data scientists.

Of all the methodologies, SHAP and LIME are two post-hoc methodologies which have gained significant limelight. SHAP calculates the marginal contribution of the features in a fashion so that the total contribution is 100%. On the other hand, LIME focusses on local fidelity. i.e. the technique that approximates any black box machine learning model with a local, interpretable linear model to explain each individual prediction.

Here is a good article on SHAP if you want to understand its working:

Can SHAP trigger a paradigm shift in Risk Analytics?

Of the two, though Lime is faster but SHAP provides both global and local consistency and interpretability and is more commonly used in the industry.

Before using SHAP, one should consider various limitations around the methodology to better understand and explain the model.

Limitation 1: Correlation not Causality: It is important to acknowledge that Shap only "explains" the variable "correlation" defined per the model structure. This doesn’t imply that the defined variables would have causality as well. In instance where the causality is missing, it could be either because of spurious correlation or because of omitted variables in the model (some variables which could have defined the output better are missing from the data set and other variables are trying to proxy the impact of these missing variables). It is important that each of the variable in the model is individually examined for importance, signage and it causal behaviour.

Limitation 2: Dependency on Model: SHAP by design is "how important a feature is to the model", it doesn’t imply "how important the feature is in reality". Loosely speaking, SHAP shows sensitivity of the variable to the output global average value – given the model.

This raises two key limitations:

  1. Note that since SHAP helps in inferring the importance of the feature for the given model, if the model is incorrectly developed/trained, there would be inherent issues with the SHAP inferences.
  2. Since the importance and the signage of the variables are defined on the basis of global average value (call it benchmark value). The incorrectness of benchmark value itself could cause the inference of variables to be wrong – both on the signage and the importance of the feature.

Limitation 3: Consistency in Feature importance and Signage: It is worth noting that the inference of the SHAP values is strongly related to the "objective" of the model. For example, if a model is developed for choosing a good equity stock (a share), the output could have different features importance (or signage) if the objective of the model is Portfolio Optimization vs Buying/Not buying a share, though both the models aim to increase returns. Therefore, SHAP output should always be analysed considering the model objective in mind.

Limitation 4: Multicollinearity Issue: If there are variables with high degree of multicollinearity, the SHAP values would be high for one of the variables and zero/very low for the other. This could toil with the idea of importance of features. Here the issue is not how SHAP assigns the values, but it relates to how the model has been trained. If the machine is trained in a manner that weight is first assigned to one variable (say x1), the other correlated variable contribution (say x2) would be minimal. This might seem counterintuitive if through business sense the second variable (x2) is more intuitive.

As Arthur C. Clarke’s Third Law states, "Any sufficiently advanced technology is indistinguishable from magic.", it is for methodologies like SHAP to unfold the realities behind the magic and define the moral science for the machines.

SHAP is an excellent measure for improving the explainability of the model. However, like any other methodology it has its own set of strengths and weakness. It is imperative that the methodology is used keeping the limitations in the mind and evaluating SHAP values with appropriate context.

If you have come across more limitations of SHAP, please do share in the comments.


Disclaimer: The views expressed in this article are opinions of the authors in their personal capacity and not of their respective employers.


Related Articles