Fairness and Bias

Forget the Trolley Problem; Pragmatic and Fair AI in the Real World

Thoughtfully (co-)designing AI systems can make a difference in the real world

Published in

Towards Data Science

8 min readApr 13, 2021

The AI doomsday scenarios, ignited by books such as The Filter Bubble (2011) and Weapons of Math Destruction (2016), are slowly being superseded by more pragmatic and nuanced views of AI. Views in which we acknowledge we’re in control of AI and able to design them in ways that reflect values of our choice.

This shift can be seen in the rising involvement of computer scientists, e.g., through books such as The Ethical Algorithm (2019) or Understand, Manage, and Prevent Algorithmic Mitigate Bias (2019), books that describe and acknowledge the challenges and complexities of algorithmic fairness, but at the same time offer concrete methods and tools for more fair and ethical algorithms. This shift can too be seen in that the methods described in these books have already found their ways into the offerings of all major cloud providers, e.g., at the FAccT 2021 Tutorial “Responsible AI in Industry: Lessons Learned in Practice” Microsoft, Google, and Amazon demoed their fair AI solutions to the multidisciplinary audience of the FAccT community.

The message is clear: we can (and should!) operationalize algorithmic fairness.

And to do this, we do not first need to “solve” the trolley problem before we allow self-driving cars on our roads, we do not need to fundamentally eradicate injustice, or rid ourselves of bias entirely, before we can build useful and fair AI tools. We’re at the point where we can skip philosophical and theoretical “paper” problems, and focus on designing and building pragmatic and real-world solutions.

As it turns out, we have lots of control over AI, and instead of solving hypothetical problems, thoughtfully (co-)designing AI systems can make a difference in the real world. Let me illustrate with fair AI in hiring and recruitment, and value-driven news recommendation.

⚖️ Fair AI in hiring and recruitment

Before we dive into the most frequently cited (but in my experience not as frequently practiced) “poster child” of the need for responsible AI and algorithmic fairness: the hiring and recruitment domain — a small reminder, helpfully illustrated by the below image from Peng et al. (2019):

Bias is everywhere!

A high-level schematic for a hybrid system for hiring. Reprinted from “What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring,” by Peng et al. (2019).

In humans…

In 1999, Steinpreis et al. did the prototypical study into human hiring bias. Their method is straightforward: collect CVs, keep their content fixed, but vary in the candidates’ gender through changing the names into typically-male or female sounding names. Send out these variations of CVs and keep track of which receive invitations for interviews.

Lo and behold: “[b]oth men and women were more likely to hire a male job applicant than a female job applicant with an identical record.” Similarly in methodology, but different in “protected attribute” is Bertrand and Mullainathan’s study, who find that “‘White’ names receive 50 percent more callbacks for interviews [than ‘African-American’ names].”

… and in algorithms

Algorithms have been found to contain and/or reproduce bias, too. You’ve probably heard of Amazon’s CV filtering system, another frequently cited example in the algorithmic fairness world. This system “taught itself that male candidates were preferable. It penalized résumés that included the word “women’s”, as in “women’s chess club captain”. And it downgraded graduates of two all-women’s colleges.”

Scientific studies found similar results, through a so-called ‘permissionless auditing method’ Chen et al. found that three popular resume search engines (engines that return job candidates given a job title query) exhibit “a (slight) penalty against feminine candidates, even when controlling for all other visible candidate features.”

🦾 AI to the rescue: fairness-aware re-ranking

Luckily, it turns out making algorithms fair in the context of hiring and recruitment can be pretty straightforward. Take, for example, LinkedIn’s idea of “fairness-aware re-ranking.” Their idea is to “[make] sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query.”

Both the idea and technical implementation are straightforward — we’re talking about a post-hoc re-ranking step, nothing algorithmically too complex —but the real-world impact is significant. The authors report “tremendous improvement in the fairness metrics without affecting the business metrics.” And the impact goes beyond experimental results, as their re-ranking model is now “deployed to all LinkedIn Recruiter users worldwide.”

How is this fairness-aware re-ranking “fair”? Obviously, it’s a small intervention, which is limited to gender bias (in the A/B test reported in their paper), and is but one operationalization of fairness. Another could be to have the algorithmic system reflect the world distribution. Another would be to represent the distribution of applicants. Or qualified candidates.

That’s the point; there is no single solution. We can argue about the extent in which this is (un)fair, but it’s better than not doing fairness-aware ranking at all. LinkedIn made an explicit choice of operationalizing this notion of fairness; a subjective solution, but a defensible one.

🤖 Taking it one step further: AI to fix human bias

Peng et al. (from the picture) take algorithms for bias mitigation in hiring one step further, and aim to adjust the algorithmic system’s output for compensating human bias further down the pipeline. Basically; show more women in response to vacancies that traditionally are more frequently filled in by men.

The authors find their ‘overcompensation’ strategy effective, as “balancing gender representation in candidate slates can correct biases for some professions,” but not in all cases; “doing so has no impact on professions where human persistent preferences are at play” (e.g., nannies and gynaecologists). Next to the “human persistent preferences,” the authors find additional factors that influence the effectiveness of their bias mitigation strategy, such as the gender of the decision-maker, complexity of the decision-making task and over- and under-representation of genders.

The latter finding underlines the complexity of bias in hiring, and the interplay between human decision making and algorithmic systems, and at the same time the importance of taking steps. It must be noted, though, the authors conducted their study using crowd-sourcing, where I think it’s safe to assume laypeople will exhibit different behavior to trained HR professionals.

📰 Value-driven AI for news recommendation

While not quite the same high stakes domain as HR and recruitment, algorithmic news dissemination has received wide-spread attention, thanks amongst others to (the anecdotal notion of) Pariser’s filter bubble (which, it turns out, is hard to establish empirically). Great news for us, when back in 2019 with a team of data scientists we designed and built a custom content-based news recommender system for Dutch financial daily “Het Financieele Dagblad” (the Dutch equivalent of the Financial Times).

So, what is fair in news recommendation? We set out to define what we believe our algorithms “should” do, by having a conversation amongst different stakeholders in our organization. More specifically, we participated in a study by Bastian & Helberger, who “conducted semi-structured interviews with employees from different departments (journalists, data scientists, product managers), it explores the costs and benefits of value articulation and a mission-sensitive approach in algorithmic news distribution from the newsroom perspective.”

Out of these interviews, and subsequent discussions, we distilled a subset of ‘editorial values’ that were shared between stakeholders, and (not unimportantly) technically feasible. We identified that our algorithmic news recommendations should continue to:

Surprise readers
Offer timely and fresh news
Enable diverse reading behavior
Increase the (reading) coverage of articles

With these four editorial values in hand, in our UMAP 2020 publication “Beyond optimizing for clicks: incorporating editorial values in news recommendation” we did an intervention to adjust our recommender system to explicitly deliver more timely and fresh recommendations (i.e., the second value from our list).

A/B-test results showing improved dynamism, serendipity, and diversity from our intervention treatment which increases timely and fresh news recommendation (source: From the author)

Similarly to LinkedIn’s fairness-aware re-ranking we incorporated a re-ranking strategy to boost recent news. Results of an A/B test between our baseline news recommender and the recency-boosted recommender showed that (i) we can increase timely and fresh news delivery without hurting accuracy (again, same story as the LinkedIn paper), but also (ii) improving fresh news recommendation increased three out of four values (freshness, surprise, and diversity).

As in the LinkedIn example, the contribution of our work is not the end-all solution to bring fair or ethical news recommendation. Boosting recent articles over older articles is nothing spectacular. What is important, is that we sat together with stakeholders to decide and choose collectively what our recommender system should do.

🗣 Discussion

A universal solution to the trolley problem does not exist: ethics and fairness are context-, domain-, culture-, time-dependent. LinkedIn’s fairness aware ranking doesn’t fix gender bias, and our value-driven news recommendations cannot replace editorial decisions.

Both, however, do better and are more fair in some respects than their non-adjusted counterparts. Be humble and forget about solving fundamental injustice, eradicating discrimination, and start reaching out and co-designing algorithms. Deciding what’s fair cannot and should not be done by us data scientists: it is not our job, nor our expertise. What should be our job and expertise, however, is to reach out to relevant stakeholders, and collectively decide and design what we want to achieve.

Note

This blog is a writeup of a talk I gave at the DDMA Meetup AI, called pragmatic ethical and fair AI for data scientists, which in itself is composed of a few slides from two prior talks; one I gave at the anti-discrimination hackathon on algorithmic bias and bias mitigation in HR, and the slide deck of our UMAP 2020 paper: “Beyond Optimizing for Clicks: Incorporating Editorial Values in News Recommendation.”

🔍 References

Peng, A., Nushi, B., Kıcıman, E., Inkpen, K., Suri, S., & Kamar, E. (2019). What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7(1), 125–134.
“The Impact of Gender on the Review of the Curricula Vitae of Job Applicants and Tenure Candidates: A National Empirical Study.” Sex Roles 41, 509–528 (1999).
Marianne Bertrand and Sendhil Mullainathan. Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. NBER (2003).
Reuters: https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-recruiting-engine
Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. Investigating the Impact of Gender on Rank in Resume Search Engines. CHI 2018, ACM.
Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘19). Association for Computing Machinery, New York, NY, USA, 2221–2231.
https://www.ifow.org/publications/artificial-intelligence-in-hiring-assessing-impacts-on-equality
Bastian, M., & Helberger, N. (2019, September). Safeguarding the journalistic DNA: Attitudes towards value-sensitive algorithm design in news recommenders [Paper presentation]. Future of Journalism Conference, Cardiff.
F. Lu, A. Dumitrache, and D. Graus, “Beyond optimizing for clicks: incorporating editorial values in news recommendation,” in Proceedings of the 28th acm conference on user modeling, adaptation and personalization, New York, NY, USA, 2020, p. 145–153.