Bias Detection in Machine Learning Models using Amazon SageMaker Clarify

Understand Bias in AI/ML context, where it can occur in AI Lifecycle, challenges faced, and how one can use SageMaker Clarify to detect bias in datasets and ML models

Gaurav Shekhar

Published in

Towards Data Science

9 min readMar 11, 2021

1. Introduction

Is your AI solution fair and trustworthy?

If someone had asked this question a few years ago, then he/she probably would not have been taken seriously.

Traditionally the focus of Artificial Intelligence/Machine Learning solutions has been towards developing new algorithms and optimizing existing models for performance accuracy.

We had assumed that an AI model is fair in its decisions and one can trust it.

However a few recent instances have challenged this fairness notion of AI models.

· Amazon AI based recruiting too was found to not rate candidates fairly and showed bias against women. (Link)

· AI based solution for predicting the risk of criminal re-offence was found to predict higher risk values for black defendants.(Link)

· A health care risk-prediction algorithm designed to predict which patients would likely need extra medical care was found to be biased and favored white patients over black patients.(Link)

In all the above instances, the decisions made by AI solution was biased. A model should not make its decision based on one’s ethnicity, gender identities, religious or socio-economic backgrounds.

The important question is what we can do to ensure AI /ML solutions are free from bias. The first step in this direction is to become aware of the presence of bias in our data and algorithms and then take steps in mitigating the bias impact.

In this study we focus on understanding the unintended bias present in AI and ML solutions, where it manifests itself in the AI lifecycle, challenges in identifying Bias and tools and techniques offered by leading cloud vendors for the problem. We then apply AWS SageMaker Clarify to detect Bias on real world datasets and models trained on them.

2. Bias in AI/ML Lifecycle

What is AI Bias?

Bias has no single definition and can take different meanings depending on the problem context. I came across one such definition which states “AI bias is an anomaly in the output of machine learning algorithms. These could be due to the prejudiced assumptions made during the algorithm development process or prejudices in the training data.”

What are the sources of AI Bias?

AI Bias can manifest in many ways. Below we depict the important causes of Bias in AI/ML models

Source: Image by Author, Inspired from the paper ‘Fairness Measures for Machine Learning in Finance’

Where does Bias Occurs

To effectively identify Bias we need to look for presence of Bias across the AI Lifecycle.

Source: Image by Author, Inspired from https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-fairness-and-explainability.html

Bias in Pre-Training Phase

1. Bias check during training data creation need to validate if the data is free of any selection bias and representative of different segments; no biases in data labels and features getting created

Bias in Model Training & Validation

2. Recently a lot of work has been done in development of fairness metrics. Are these fairness metrics included in an algorithms’ objective function

3. Does our Model Validation and testing pipelines include checks for relevant fairness metrics

Bias during Model Deployment

4. A model that is free from bias at the time of development and validation can give biased results if it’s being used to make predictions on data for which it’s not trained upon. Have we included checks to ascertain this

Bias during Model Monitoring

5. How are we mitigating the impact of Bias which may have entered over time after model is deployed? Are there feedbacks loop created which track the drift over time and send these feedbacks to the algorithm for self-healing?

3. Key Challenges in Bias Detection

Source: Image from Author, Inspired from the paper ‘Bias in Data-driven AI Systems’

Broadly we can group the challenges under four buckets:

1. Data Collection: While collecting data for model building tasks often there is not enough data about the segments which are underrepresented. This leads to Selection Bias. Also with the growing popularity of crowd sourced data, one needs to consider the individual bias of the data and label creators that gets added to the data.

2. Sensitive Features: How do we identify the sensitive features that can cause unintended bias. At present there is no standard approach which helps identify these sensitive features and we are depending on exploratory analysis and domain knowledge to identify these.

3. Multi Modal Data: Many of the current approaches for identification and mitigation of bias are built for structured data. But increasingly AI solutions are trained on multi modal data which includes text/image/video. There is a need to strengthen the Bias detection capabilities to handle multi modal data.

4. Complex Feature Engineering: Feature Engineering improves model performance but at the same time the complex features are difficult to trace back to origins. This challenge gains bigger proportion when we are working with contemporary NLP and Computer vision solutions where pre-trained embeddings from open source data are prevalent.

4. Bias Detection offerings from Cloud Vendors

Below we have identified prominent Bias Detection toolkits from leading cloud vendors like IBM, Microsoft, AWS. These toolkits are well integrated with other AI/ML offerings from the vendor.

· IBM: The AI Fairness 360 toolkit (AIF360) is an open source software toolkit that can help detect and remove bias in machine learning models.AIF360 enables AI developers and data scientists to easily check for biases at multiple points along their machine learning pipeline, using the appropriate bias metric for their circumstances. It also provides a range of state-of-the-art bias mitigation techniques that enable the developer or data scientist to reduce any discovered bias.

· Microsoft: Fairlearn is an open source toolkit that empowers data scientists and developers to assess and improve the fairness of their AI systems. It has two components: an interactive visualization dashboard and unfairness mitigation algorithms. These components are designed to help with navigating trade-offs between fairness and model performance.

· AWS: Amazon SageMaker Clarify provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. The Clarify service helps detects potential bias during data preparation, after model training, and in your deployed model by examining attributes you specify. It also provides detailed report that quantifies different types of possible bias.

5. Bias Detection using AWS SageMaker Clarify

To demonstrate the effectiveness of Bias Detection solution we applied AWS SageMaker Clarify for bias detection on two datasets:

i)Problem Statement:

Identify bias in datasets
Identify bias in Model Predictions.

ii)Data Source:

· Dataset1 is a Kaggle data that measure the customer ratings for Call Drops across Telecom Networks like 4G, 3G for different telecom operators across different states in India measured for a one month period.

· Dataset 2 is a synthetic data that simulates a real dataset in Investment world . For those interested in Synthetic data, please refer to my previous blog on the topic.

iii)Solution Approach for Bias Detection using SageMaker Clarify:

We leverage SageMaker Clarify to identify Bias on above two datasets. Below is a high level solution approach for Bias Detection in datasets.

i. In the first step we identify those attributes that may cause bias. These attributes may be related to gender, age nationality etc.

ii. Once we have identified the sensitive attributes, we identify the range of values in these attributes which represents a disfavored group.

iii. The sensitive attribute and sensitive value is fed to the Bias Algorithm.

iv. Within the Bias algorithm, the dataset is divided into a favored and disfavored group based on the sensitive features and values specified.

v. Statistical tests are done and pre-training and post-training metrics are computed for bias detection. Based on interpretation of statistical metrics, we identify the presence/absence of Bias.

iv)Exploratory Analysis Results:

Data Exploration was done on the two datasets before the bias detection process.

Dataset 1

We observe that Dataset 1 has more observations from some of the dominant groups. A majority of the data was measured for 4G Network and almost 80 % of observations were those where customers were satisfied with Call Drop.

Dataset 2:

Dataset 2 is a synthetic data that has been cleaned and normalized. We have removed some of the sensitive attributes which can identify the end customer and also balanced the training data using SMOTE, a synthetic oversampling approach.

v) Bias Detection Results

a)Bias Detection Results on Dataset 1

For Dataset 1, we checked for presence of Bias using sensitive attributes like Network Type which have fewer observations, States which are under-represented and Operators who have fewer subscribers.

For most of the sensitive features we found that the statistical metrics like KL divergence, JS divergence which measured how much the outcome distributions of different groups diverge from each other entropically was significant, thereby indicating the presence of bias in data. Also we found statistically significant values for CDDL metric which measures the disparity by subgroups of data.

Source: Image by Author, Bias Detection Results from Pre-Training metrics for Dataset 1

Below table helps interpret some of the Pre-Training metrics shared above:

Source: AWS Image, https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-data-bias.html

Based on interpretation of above pre-training metrics we conclude that significant bias is present in Dataset 1. Since significant bias was found in Dataset 1, we did not go ahead with model building on this data.

b)Bias Detection Results on Dataset 2

For Dataset 2, we conducted bias tests for sensitive attributes like age and certain flags which identified the source of financial products.

Below are some of the pre-training metrics result we got for the above tests.

Source: Image by Author, Pre-Training results of Clarify Bias Detection

We do not see significant bias across most of the statistical metrics for sensitive features. This may be attributed to the fact that the data was synthetically produced and we had applied balancing and normalization techniques before the bias detection. The only bias seen is in the Class Imbalance metric which measures the imbalance in number of members between different groups.

Since there was no bias found in dataset 2, we went ahead and trained a Classification model on the data which predicts the propensity of customers to opt for a given product. Bias detection tests were performed on model predictions generated and below are the results of the post training metrics

Source: Image by Author, Results from Post Training Bias Metrics

Below table helps interpret some of the Post Training metrics shared above

Source: Image AWS, https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-measure-post-training-bias.html

Based on the results for Post Training metric, we conclude the absence of any significant bias in the model predictions.

6. Future Challenges

Some of the open challenges that still need to be addressed for wider adoption of Bias Detection solutions are listed below:

· At present notions of bias and fairness are highly application dependent and there is no one uniform approach that can be applied to a problem.

· No agreed methodology for selecting the attribute(s) against which bias is to be measured.

· Lack of standardization when it comes to selecting the specific Pretraining bias metrics for measuring bias; this is still guided by social, legal and other non-technical considerations

· Need for further research work on Bias identification for unstructured data like Images/Text/Video.

7. References

1) Amazon SageMaker Clarify — https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-fairness-and-explainability.html

2) AI Fairness 360 -https://developer.ibm.com/technologies/artificial-intelligence/projects/ai-fairness-360/

3) Fairness Measures for Machine Learning in Finance — https://pages.awscloud.com/rs/112-TZM-766/images/Fairness.Measures.for.Machine.Learning.in.Finance.pdf

4) Fairlearn — https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/

5) Bias in AI, https://research.aimultiple.com/ai-bias/

6) Bias in Data-driven AI Systems — An Introductory Survey — https://arxiv.org/pdf/2001.09762.pdf

Disclaimer: The opinions shared in this article are my own and do not necessarily reflect those of Fidelity International or any affiliated parties.