Bias Creeps into Technology

We can’t avoid bringing our own perspective to the products we build

Catherine Breslin
Towards Data Science

--

Writing computer code at a desk
Writing computer code — Image by Free-Photos from Pixabay

The majority of folks who build technology don’t intend to be biased. Yet we all have our own unique perspective on the world, and we can’t help but bring that into our work. We make decisions based on our views and our experiences. Those decisions may each seem small in isolation, but they accumulate. And, as a result, technology often reflects the views of those who build it.

Here are a few of the places I’ve seen where bias creeps into technology.

The datasets we construct

With the recent success of machine learning (ML) and AI algorithms, data is becoming increasingly important. ML algorithms learn their behaviour from a dataset. What’s contained in those datasets becomes important as it directly impacts the performance of a product.

Take the field of Natural Language Understanding (NLU) where large pre-trained models have recently become popular. The pre-trained models are expensive to build, but once built they can be reused among different tasks by different people. BERT is one of the most widely used pre-trained models, and it was built from Wikipedia text. Wikipedia has its own problems as a data source. Of biographies on the site, only 18% are of women and the vast majority of content is written by editors in Europe and North America. The resulting biases in Wikipedia are learnt by the BERT model and propagated.

In another field, Computer Vision, datasets are equally problematic in their composition. One class of datasets is of faces, from which facial recognition systems are trained. They are often overwhelmingly white, with two popular datasets having 79.6% and 86.2% lighter faces. Datasets like this lead to ML models which perform poorly for people with darker skin.

The problems we decide to tackle, or not

At CogX, I hosted a session where Dr Heidi Christensen talked about her work researching voice technology for those with disordered speech & clinical voice impairments. Voice technology potentially has a large impact on the lives of those with voice impairments because the same conditions that affect their voice also affect other movements, making it hard to carry out many simple tasks. Gaining independence is associated with better outcomes. Yet, the mainstream of voice technology focuses on healthy speakers and not on those with non-standard speech patterns.

Other times, the framing of a task is problematic and risks perpetuating stereotypes. Take the task of gender classification. I’m confident that nothing good is going to come out of a system identifying me as a woman. I might though, for example, be shown adverts for lower paying jobs or get pointed to more expensive products.

Decisions about what tasks to work on are usually guided by financial concerns — who will fund a product and who will pay to use it — but also personal experience of those building and the issues that resonate with them.

The priorities we assign to tasks

I wear different hats, but one of my many jobs has been to prioritise technical tasks that teams work on. In an ideal world, we’d have enough time and money to work on everything. But in reality, we have a limited amount of time. We have to pick and choose what to work on. Each prioritisation decision might seem small and inconsequential, but together they add up to have a big impact on the direction of a product.

Here’s a hypothetical example inspired by real events— when building a speech recognition system we might put effort into building a balanced training set, but still evaluate the system to find that it performs poorly for a particular demographic. For example, our speech recognition system might perform worse for UK speakers because some of our pronunciations and word choices are very different from those the system is expecting. Now, there’s the choice between spending the team’s effort to investigate and make the system more reliable for the UK users, vs. experimenting with a new model architecture that looks promising to make the entire system perform better for everyone. This choice isn’t always an easy one to make. The second option will probably end up improving performance for the UK users too, but it wouldn’t address the imbalance.

The opinions we listen to

The people we listen to influence both our thinking and our views of who should hold those influential positions.

The demographics of tech companies — the visionaries and the decision makers — are notoriously skewed. At the top, the ranks are dominated by white and Asian men. Outside of tech, the composition of the top levels isn’t a whole lot better. The FTSE 100 has more men named Steve than ethnic minorities as CEO, and only 6 women. By not promoting a wider range of people into these ranks, we are not listening to their views and perspectives.

A survey in 2016 showed that the British media is 94% white with women paid significantly less than men. While traditional media is skewed in its composition, social media has given a platform to many underrepresented voices. Yet, even social media has a gender problem. A recent study looked at the field of academic medicine, and found that “female academics also have disproportionately fewer Twitter followers, likes and retweets than their male counterparts on the platform, regardless of their Twitter activity levels or professional rank”.

The metrics we evaluate ourselves by

We often evaluate ML systems by their average error rate. It’s easy to compute this and compare it across different systems. Perhaps 100 people use our system, and each has an error rate of 95%. For most systems, that’s perfectly usable. Suppose instead that 90 of those users have an error rate of 98%, and 10 have an error rate of 68%. Now, there’s a huge discrepancy between these two groups, but the average error rate is still 95%. The group of users finding the error rate of 68% might find the system unusable, but that doesn’t show in average metrics. Without measuring performance for different demographic groups, we can’t uncover biases in the models we build.

In other products, we measure and optimise engagement — number of clicks & likes on a post, or time spent on a website. But, engagement may not be the best measure for the wellbeing of users. Engagement can be caused not only by liking someone’s page or post, but also by frustration with the content. It’s been shown that higher levels of sentiment in a headline makes readers more likely to click on it, but also polarises views and leads to echo chambers which reinforce extreme views over time rather than challenge them.

The view we take of our customers

When I was pregnant, my Nintendo Wii berated me for putting on weight. There was no way to tell it that my weight gain was only temporary, and ultimately I consigned it to the bin. We design systems imagining how our customers will use them. But, customers are different from us in ways which we cannot always anticipate. It seemed that the Nintendo Wii designers hadn’t anticipated pregnancy as something their users might experience. The Nintendo Wii is something I could just stop using, but pregnancy discrimination is a very real issue.

At another time, I was on the receiving end of a presentation about a home security system, hooked up to the cloud. Not only could you check the footage online, while you were out of the house, but you could also check who was in the house at any time and set alerts when particular events happened. The team designing this imagined their users to be like them — proud fathers and loving husbands, who simply wanted to do a good job of keeping their homes secure. But, technology also enables abuse in new ways. The designers didn’t imagine that some of their customers might take advantage of such a home security system with different intentions, and so safety was an afterthought.

The view we take of our customers can limit what we build for them, often in areas where we have blindspots, and reinforce the biases we hold.

The business models we choose

In the technology industry, many Engineering jobs are well paid and secure. This contrasts to low paid, insecure jobs such as data annotator, driver and content moderator without which many tech companies could not operate. A reliance on these roles is crucial to the business model of many tech companies and the demographics of these workers are very different to those of the engineering staff.

A recent study of algorithmic pricing of ride hailing companies found that factors like ethnicity, education and age all affected ride prices, despite not being an explicit part of the model. This is because they are correlated with factors that the models do take into account, like location.

Another common business model relies on offering services for free and getting revenue from advertising. This is a double-edged sword. On the one hand, making products available for free widens their reach and allows people to use them who might not otherwise have been able to afford them. On the other hand, the targeted advertising that comes with this business model is another way that bias is reinforced, for example by allowing adverts to be targeted for race and gender.

Technology has made a huge impact in the world. But the world is biased and those of us who build technology have blindspots that we don’t even know about. Even with the best intentions, it’s difficult to keep bias out of the products we build.

You can hire me! If I can help your organisation use AI & machine learning, please get in touch.

--

--