Artificial Intelligence and Machine Learning are awesome. They allow our mobile assistants to understand our voices and book us an Uber. AI and Machine Learning systems recommend us books in Amazon, similar to the ones we’ve liked in the past. They might even make us have an amazing match in a dating application and meet the love of our life.
All of these are cool but potentially harmless applications of AI: If your voice assistant doesn’t understand you, you can just open the Uber application and order a car yourself. If Amazon recommends you a book that you might not like, a little research can make you discard it. If an app takes you on a blind date with someone who is not a good match for you, you might even end up having a good time meeting somebody who’s personality might be bewildering.
Things get rough, however, when AI is used for more serious tasks like filtering job candidates, giving out loans, accepting or rejecting insurance requests, or even for medical diagnosis. All of the previous decisions, partially assisted or completely taken care of by AI systems can have a tremendous impact on somebody’s life.
For these kinds of tasks, the data that is fed into the Machine Learning systems that sit at the core of this AI applications has to be contentiously studied, trying to avoid the use of information proxies: pieces of data that are used to substitute another one that would be more legitimate and precise for a certain task but that is not available.
Let’s go to the example of car insurance requests that are automated by a machine learning system: an excellent driver that lives in a poor and badly regarded area could get a car insurance request rejected if ZIP code is used as a variable in the model, instead of pure driving and payment metrics.
Aside from these proxies, AI systems also depend on the data that they were trained with in another manner: training in non-representative samples of a population, or training on data that has been labelled with some sort of bias, produces the same bias in the resulting system.
Let’s see some examples of bias derived from AI.
Tay: The offensive Twitter Bot

Tay (Thinking about you) was a Twitter Artificial Intelligence chatbot designed to mimic the language patterns of a 19 year old american girl. It was developed by Microsoft in 2016 under the user name TayandYou, and was put on the platform with the intention of engaging in conversations with other users, and even uploading images and memes from the internet.
After 16 hours and 96000 tweets it had to be shut down, as it began to post inflammatory and offensive tweets, despite having been hard-coded with a list of certain topics to avoid. Because the bot learned from the conversations it had, when users that interacted with it started tweeting politically incorrect phrases, the bot learned these patterns and started posting conflicting messages about certain topics.
Machine learning systems learn from what they see, and in this case, this parrot like behaviour adopted by Tay caused a big public shame for Microsoft, that ended with this letter, as their 19 year old girl turned into a neo-Nazi millennial chatbot.
In the following link you can find some examples of Tay’s Tweets.
Now, imagine if instead of being intended for its use on a social network, a Chatbot like this one had been used as a virtual psychologist or something similar. Or imagine that the bot started targeting specific people in social media and attacking them. The people speaking to it could have been seriously hurt.
Google’s Racist Image application

Another big tech company, Google this time, has also had some issues regarding bias an racism. In 2015, some users of Google’s image recognition in Google’s Photos received results where the application was identifying black people as Gorillas. Google apologised for this and came out saying that Image recognition technologies were still at an early stage, but that they would solve the problem. You can read all about it in the following link.
Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech
If a company as powerful and technologically advanced as Google can have these sort of issues, imagine the hundreds of thousands of other businesses that create AI powered software and applications without such expertise. It’s a good reminder of how difficult it can be to train AI software to be consistent and robust.
This is not however, the only issue Google has had with images and Artificial Intelligence. Hand held thermometer guns have become widely used throughout the COVID pandemic, and Google’s Cloud Vision Software (a service for detecting and classifying objects in images) has had to quickly learn to identify these kind of devices in order to correctly classify them using data sets containing very few images, as these devices, despite not being new, have become known to the general public very recently.

The previous image shows how one of these thermometer guns gets classified as a gun when it is held by a person of dark skin, and as a monocular when it is held by a person with salmon color skin. Tracy Frey, director of Product Strategy and Operations at Google, wrote after this viral case:
"this result [was] unacceptable. The connection with this outcome and racism is important to recognise, and we are deeply sorry for any harm this may have caused."
The way Google has fixed this is by changing the confidence probabilities (the 61% that appears in the image above) needed for Cloud Vision to return a gun or firearm, however, this is just a change in the displays of the results of the Artificial Intelligence model, and not the model itself, highlighting again the difficulties of getting these systems to behave properly in many cases, especially when there is little data.
What if a system like this one had been used for locating potentially harmful or suspicious individuals using surveillance cameras in the street? Innocent people could have been targeted as dangerous just because of their skin color.
Latest Biased AI news:
Recently, there’s been a lot of discussion around the topic of Bias in Artificial Intelligence between some of the top AI researchers in the world, spawned from the publication of the paper "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models". This model transforms low resolution images into a higher resolution one using AI, as shown in the following tweet.
This tweet came with a link to a Google Colab Notebook (a programming environment) where anyone could run code and try the model using different images. This soon led to people finding that PULSE appears to be biased in favor of outputting images of white people, having a concrete user respond to the previous one with a pixelated image of Barack Obama, that was reconstructed into an image of a white man.
The authors of the paper responded to this, adding a bias section to the paper, and including a Model Card: a document that clarifies the details of the model, its purpose, the metrics used to evaluate it, that data it was trained with, and a breakdown of results in different races along with some ethical considerations. I think the creation of this kind of documents when a Machine Learning model is constructed is a great practice that should be more frequently done.
You can find the discussion and further information on this topic in the following link.
Other examples of Bias in Artificial Intelligence
Aside from these previous cases, all of which had some resonance in the media, there are many other, lesser known cases of models that have a similar discriminatory smell as the previous ones. A section could be written for each, but here we will briefly mention them, allowing the reader to further investigate if desired.
- Woman are less likely than men to be shown ads for high paid jobs on Google. The models built for displaying these adds used information such as personal information, browsing history and internet activity. Link.
- An algorithmic Jury: Using Artificial Intelligence to predict Recidivism rates. A predictive model used for seeing is an individual would commit crimes again after being set free (and therefore used to extend or decrease the individual’s time in jail) shows racial bias, being a lot tougher on black individuals than on white ones. Link.
- Uber’s Greyball: escaping Worldwide authorities: Data collected from the Uber app, is used to evade local authorities who try to clamp down their riders in countries where the services is not permitted by law. This is not an example of bias per se, but it puts focus on what AI can do to discriminate certain users (in this case Police officers), and how it can be used towards selfish interests. Link.
- Lastly, not all were going to be bad news for AI. The following link shows how AI powered systems can reduce bias in University recruiting applications: Link.
What can we do about all this?
We’ve seen what can happen if AI systems start showing racial, gender or any other kind of bias, but, what can we do about it?
To regulate these mathematical models, the first step has to start with the modellers themselves. When creating these models, designers should try to avoid using overly complex mathematical tools that put a fog on the simplicity and explain-ability of the models. They should study very carefully the data that is being used to build these models, and try to avoid the use of dangerous proxies.
Also, they should always consider the final goal of the models: making people life’s easier, providing value to the community, and improving our overall quality of life, being through business or academia, instead of focusing on the Machine Learning metric like accuracy or mean squared error. Also, if the models are built for an specific business, another usual success metric probably has to be put on a second plane: economic profit. Aside from this profit, the results of the models in terms of the decision it is making should be examined: following our insurance example, the creators of the model should look at who is getting rejected, and try to understand why.
As we progress into a more data-driven world, governments might have to step in to provide a fair and transparent regulation for the use of Artificial Intelligence models in certain areas like finance, insurance, medicine and education. All of these are fundamental pieces of any individuals life, and should be treated very carefully.
As AI practitioners, the people creating the systems have a responsibility to re-examine the ways they collect and use data. Recent proposals set standards for documenting models and datasets to weed out harmful biases before they take root, using the Model Cards mentioned before and a similar system for datasets: Datasheets.
Aside from this, we should try to build non black-box, explainable models, audit these models, and track their results carefully, taking the time to manually analyse some of the outcomes.
Lastly, we can educate the wider community and general public on how data is used, what can be done with it, how it can affect them, and also let them know transparently when they are being evaluated by an AI model.
Conclusion and additional Resources
That is it! As always, I hope you enjoyed the post, and that I managed to help you understand a little bit more about bias in AI, its causes, effects, and how we can fight against it.
Here you can find some additional resources in case you want to learn more about the topic:
- Addressing the gender bias in AI and automation.
- Is AI doomed to be racist and sexist?
- AI and Bias – IBM Research
- Weapons of Math Destruction: How Big Data Increases Inequality and threatens Democracy, by Cathy O’Neil.
If you want to learn more about Machine Learning and Artificial Intelligence follow me on Medium, and stay tuned for my next posts! Also, you can check out this repository for more resources on Machine Learning and AI!