Fixing Bias in AI Systems

AI models are as good as the algorithms and data they are trained on. When an AI system fails, it is usually due to three factors; 1) the algorithm has been incorrectly trained, 2) there is bias in the system’s training data, or 3) there is developer bias in the model building process. The focus of this article is on the bias in training data and the bias that is coded directly into AI systems by model developers.

Developer Bias
"I think today, the AI community at large has a self-selecting bias simply because the people who are building such systems are still largely white, young and male. I think there is a recognition that we need to get beyond it, but the reality is that we haven’t necessarily done so yet." – Grady Booch, IBM’s Chief Watson Sciences
Developer bias is a product of the lack of diversity in data teams. Socially responsible organizations recognize the need to change this dynamic and are engaged in a sustained effort of engaging underrepresented communities to increase diversity in their data teams. This effort will take time. In the meantime, another way to address the default value system that is being inherited by AI systems is for the technology industry to adopt a universal ethical framework for AI.
An AI Ethical Model Framework
A big reason why an Ethical Model Framework AI has not yet been established is that there is no consensus on whose values and ethics systems should be used to build one.
So why not let the model build its own ethical framework? This counter intuitive notion is exactly what IBM’s Murray Campbell suggests through the use of inverse reinforcement learning . Inverse reinforcement learning involves letting the system learn how people behave in various situations so that it can figure out what people value, allowing the model to make decisions in line with our underlying ethical principles. This solution is counterintuitive, but it also gives models the ability to update belief systems with us. In an everchanging world, this is should be feature in all AI systems.
How Bias Creeps Into AI Model Building
Supervised learning algorithms rely on humans to do the labeling, which is a step in the model building process that can introduce human bias into systems. There may also be a subject matter expert informing the learning system along the way. That subject matter expert is also inputting their own biases when providing input.
A technique that can eliminate these bias injecting steps out of the AI system building process is unsupervised learning. Unsupervised learning uses unlabeled data to train a system with little to zero human intervention, learning from its environment through interacting with it. This goes a long way in reducing the opportunity for bias to seep into the model.
We are still far away from developing pure unsupervised learning, but there are newer learning methods that are hybrid models that lie somewhere in the middle of the spectrum between unsupervised and supervised learning. These models require less data volume, helping to solve the problem of less representative data sets. In addition, these models require less labelling and human intervention, reducing the possibility of bias making its way in the data preprocessing phase. One such approach to unsupervised learning is meta-learning.
Meta-Learning for Fixing Biased Datasets
Meta-Learning is a subfield of machine learning where deep learning models are efficiently trained with less data. The way this is done is for machines to learn a complex task, using the same principles it used to learn one task and applying it to others. This form of generalization allows learning models to acquire new skills more quickly.
The one-shot meta-learning technique is a good example of this modeling approach. One-shot learning can be applied to tasks like face recognition, where we have many classes but very few examples per class. This is true of many datasets for face recognition, where White faces are over-represented. In the one-shot, a deep neural network is engineered which is then capable of generalizing from training datasets to unseen datasets. One-shot classification is like normal classification, but instead of using data samples, entire datasets are used. The model is trained on different learning tasks/datasets and then optimized for peak performance on an array of training tasks and unseen data.

Looking Forward
Fixing unintentional bias requires improvements in both awareness and data collection techniques. For better data integrity, organizations must manually collect data themselves to ensure that the data sets the machines are being fed are indeed diverse and inclusive. Collecting more data may be unfeasible and cost prohibitive. Organizations must also continue their efforts of bringing more diversity and representation into their Data Science teams.
Unsupervised Learning and Meta Learning are two techniques that can address bias that creeps into AI systems at the stages of data collection, feature labelling and model development. Furthermore, a Universal AI Ethical Framework is an important step that can ensure that the values that are guiding AI system building are values that protect and promote society as whole.
Living through the COVID-19 pandemic has taught us that bias is an issue we need to undoubtedly and urgently address. Our priorities have shifted. Life, time, and context reshape our ethics and values and our AI systems should be developed so that they are allowed to update belief systems in line with our changing societal values.
Grady Booch echoes this sentiment in his Ted Talk, Don’t Fear Super Intelligent AI:
"We are on an incredible journey of coevolution with our machines. The humans we are today are not the humans we will be then."
What a future awaits us all.