How to Prevent Racist Robots

What is algorithmic fairness, and why is it important?

Mia Iseman
Towards Data Science

--

Has this ever happened to you?

You’re telling your coworker a story, and you mention your race (not white). All of a sudden your (white) coworker interrupts you to announce:

You sure about that, Chad?

Talking to your boss about something related to you identifying as female? “Why do you have to bring gender into it? If you want to be treated as an equal, then stop reminding me you’re a woman.”

Want to get married, but the photographer denies you service? “It’s just a coincidence that you’re gay! Doesn’t sound like discrimination to me!”

There are many subgroups of people in America that are oppressed in many ways. Systemic oppression is complex, but just because we close our eyes and plug our ears doesn’t mean injustices will cease to exist. It just means we’re standing in the middle of it, ignoring it.

Ignorance is not bliss. It just makes you an ignoramus.

Still, many people don’t understand or seek to understand the experiences of people of color, women, LGBTQ people, differently abled folks, or whatever other targeted subgroup is literally shouting loud and clear to them. They choose to ignore the “otherness.” It should be noted, that this is often well-intentioned, but its result is the direct opposite. The same is the case for certain decisions that machines make on a daily basis.

In their scholarly article “Algorithmic Fairness,” the authors make the case that we should include race and other factors representative of an oppressed subgroup if we (and our machines) want to make important decisions with the information. If we do ignore these important factors, we can do more harm than good. Why? Let’s break down what the article says.

An algorithm is a set of steps, rules, or calculations, and it’s common for computers to have algorithms that they follow. It’s increasingly common for us to use algorithms to make decisions. Just a few common yet important examples: A doctor deciding if someone is at risk for a disease, a judge setting bail, and a school parsing through applications.

It has been shown time and time again that our “robots” are taking on our human biases and making discriminatory decisions. From the article:

Because the data used to train these algorithms are themselves tinged with stereotypes and past discrimination, it is natural to worry that biases are being ‘baked in.’

A common reaction, then, is to leave the sensitive factors out, to train robots that announce:

Why is this a bad idea? Before we can say, we must know the definition of one key word: equitable. Equitable means fair. Equal means “the same.” So, keep in mind that equitable does not mean equal.

The Difference Between Equality and Equity

When we treat people equally, we give everyone the same support. When we treat people equitably, we give everyone the support they need so that the result is fair.

Of course, our world is even more complicated…

Another idea: True liberation means no one cares about watching the game?

…but this knowledge will serve us enough to understand the article. The article does an experiment that looks like this:

Imagine we have two different college admissions counselors. One counselor, Erica, is efficient. Erica’s goal is always to admit students to the college that will perform the best during their time there. So, she looks at the student’s application, and she uses a machine learning model to predict what that student’s college GPA will be. All Erica has to do is select the students with the most potential.

The other counselor, Farrah, is equitable and efficient. Farrah also wants students with the most potential, but she treats the applicants equitably. In the experiment, they limit the student population to white non-Hispanic students and black students. So, the only variable that requires “different” treatment in the predictive model is race. Farrah has to select the students with the highest predicted college GPAs, making sure to have a certain percent of students that are black.

So Erica is efficient and just wants the applicants that will do the best in college. Farrah is fair and wants the applicants that will do the best in college combined with the fact that she wants to maximize the number of black students at the college.

The model they used was able to predict the college GPAs because the experiment involved tracking the applicants over time. So, they had a bunch of high school data as well as their actual eventual college GPAs to show how they really performed in college.

Erica ran the model two ways: excluding race, and including race. Regardless of the model, remember she always chose efficiently — the students with the highest predicted GPAs.

Farrah ran the model including race, and she set different thresholds to try and undo some of the systemic bias that black applicants may encounter. For instance, we might worry a black applicant didn’t take SAT prep courses, while a white student did. Then, if the two students got the same SAT score, the model would predict that the black student would have a higher college GPA. This is just one example of many complicated adjustments to the model’s thresholds that Farrah made.

The results are pretty awesome:

  • The model that best predicts how an applicant will do in college always takes race into consideration, even in Erica’s case when she is concerned with efficiency only. This might seem counterintuitive or confusing. In other words, when humans process applications and guess how the applicant will do college, it is tinged with all our biases, so we often want to drop race from the equation to fool ourselves into think we are being fair. However, it is actually more efficient (accurate) for a mathematical model to make our predictions for us, because they are basing their predictions on the numbers, not on biased hunches. So, it is more efficient to include race in the models.
  • When Farrah makes her equitable adjustments to thresholds, no matter what percentage of black students she is trying to admit, using the race-aware predictor leads to relatively more black applicants being admitted to the college. So, it is essential to include race in the models in order to be more fair.
  • In summary: Always include race. The robot should see color.

“Absent legal constraints, one should include variables such as gender and race for fairness reasons…The inclusion of such variables can increase both equity and efficiency.”

They ran this experiment using machine learning models of all sorts, but they consistently found that “the strategy of blinding the algorithm to race inadvertently detracts from fairness.”

While the article has a cut-and-dry conclusion, accounting for algorithmic bias is not an easy fix. How do we define what is equitable? As another researcher named Sarah Scheffler points out:

There are many different measures of fairness, and there are trade-offs between them. So to what extent are the … systems compatible with the notion of fairness we want to achieve?

Even if some scientists agree on what is equitable, we also often must make sure it is legal to implement our algorithms with different equitable thresholds. As we’ve seen with Affirmative Action’s history, for just one example, there is often a difference of opinion what should be legally implemented.

I’m hopeful for a couple different things:

  • I hope that we take advantage of teachable moments with others, where we explain why it is very valuable to take one’s entire circumstances into perspective and not take on an “ignorance is bliss” mentality.
  • I also hope that those who are driven by analytical reasoning may understand articles like the one we’ve just rehashed. We make the best predictions possible by including protected factors like race in our science, not by ignoring it. This benefits everyone, regardless of our definitions of “fair.”

Here’s hoping these two circumstances converge to a society that consistently makes decisions for the greater good with the help of machines.

--

--