Why Eliminating Bias in AI Systems Is So Hard

Published in

Towards Data Science

3 min readOct 28, 2021

Every time we decide what dataset to use, which features to select, or how to fine-tune our model, our biases come into play. Some of them are neutral, perhaps even benign—expertise is a form of bias, too. Others are potentially discriminatory and harmful; how do we ensure those stay out of the picture?

The short answer: it’s very, very difficult. In this week’s Variable, we start with three excellent contributions on that very question—let’s dive right in. (Looking for other, more technical topics? Read on: we’ve got you covered, too.)

Learn about the risks of introducing bias across the ML/AI pipeline. As Sheenal Srivastava explains in this panoramic overview, when we talk about bias we actually think about multiple biases, each influencing our projects, workflows, and outcomes in different ways. Sheenal provides us with a thorough roadmap for recognizing, addressing, and—over time—preventing them from coming into play in the first place.
Explore the most effective methods to ensure your models are explainable. At the heart of many bias conversations is the problem of explainability—or lack thereof: so many ML models and AI systems work only thanks to an opaque process separating input from output. Divya Gopinath shares a taxonomy of explainability techniques that can help practitioners choose the right approach, depending on the context and on their specific needs.

Listen to a prominent AI ethics researcher discuss the practical side of fairness and bias. After many years operationalizing AI ethics at major tech companies, Margaret Mitchell is a leading voice on the dangers of undetected and unaddressed bias. You don’t want to miss her lively conversation with TDS Podcast host Jeremie Harris, which covered inclusion, the fractal nature of fairness problems, and the levels and types of bias we can tolerate in our models.

Beyond these essential discussions of bias and its consequences, we also published dozens of guides and tutorials this past week. It was hard to choose (it always is), but here are three we think you might particularly enjoy.

Tackle supply-chain issues by leveraging your data and Python skills. Talk of product shortages and shipping delays is all around us these days. As Will Keefe shows, however, your knowledge of Python can be a major difference-maker when it comes to modeling needs and constraints, preparing and scheduling production cycles, and visualizing and communicating plans.
Empower colleagues and stakeholders to make the most of the data they have—on their own. Many data scientists working in industry find themselves spending too much time producing rudimentary data insights for their less data-savvy peers. Wenling Yao insists that there’s a better way: training internal users to become effective data consumers, and even data creators.
Gain a more nuanced understanding of overfitting (and why it matters). Klas Leino’s post is a clear and helpful guide to overfitting, specifically in the context of deep learning. Klas introduces us to the TruLens library and shows how it can help us avoid the pitfalls of unsound features—and increase our trust in the model’s performance on future unseen data.

We hope you enjoyed the time you spent with us this week! If you’d like to support our authors’ work (and ours), consider becoming a Medium member today.

Until the next Variable,
TDS Editors

Why Eliminating Bias in AI Systems Is So Hard

Recent additions to our curated topics:

Getting Started

Hands-On Tutorials

Deep Dives

Thoughts and Theory

Written by TDS Editors