The Time I Coded a Biased Algorithm

The Data Science ethical issues that haunt me at night and keep me responsible in practice

Alison Yuhan Yao
Towards Data Science

--

Photo by Mariah Krafft on Unsplash

I often see people talking about AI and Data Science biases and solutions from an objective perspective, while biases are highly subjective. Every time the model implementor makes a decision to include or exclude something, potential biases may arise. Therefore, I hope to tell a story from the perspective of a Data Science practitioner.

I recently confirmed that I am free to share a personal story from my past internship. Sad to admit, but I have accidentally designed and coded a Computer Vision algorithm that later turned out to be biased.

Background

I was working at a Chinese IT company as an AI intern and one of the clients was an international wedding photography studio. I was quite surprised at this unusual pair because I did not expect photographers to need AI. But you see, it takes little effort to take thousands of photos at a wedding. One can keep pressing the shutter on the camera and there guarantees a perfect photo somewhere that captures the perfect moment.

But that’s where it gets tricky. A wedding photo album cannot fit this many photos. It is very time-consuming and labor-heavy to go through so many photos for so many weddings manually, so my job was to automate the photo selection process as much as I could.

According to the client, a wedding photo album typically features 3 types of photos:

  • the ones with only the bride or the groom (1 person),
  • the ones with both the bride and the groom (2 people), and
  • group photos of more than 2 people.

Unfortunately, algorithms could not select the one perfect photo for each category because computers have no aesthetic sense, but they could get rid of the absolutely useless photos. For example, we do not need photos where nobody is in the frame. So, the computer can help us narrow down to a couple hundred, instead of a couple thousand photos, to lighten the workload.

So how do we do this?

Implementation

Data

I needed to start with a lot of image data. Wedding photos have high resolutions, so I obtained a few thousand photos from around 20 couples on my computer that took up all of my storage space.

Algorithm Design

To process image data, Computer Vision algorithms and deep learning are the way to go. To be more specific, since I was classifying images based on the number of people in them, I needed an object detection algorithm to count occurrences. I tried counting human torsos and counting faces. Face detection did not work too well when people are not facing the camera.

If I counted 3 people, the image would definitely be a group photo. If I counted 2, the photo was very likely to be a couple photo. The occasions where the two people are not the couple is quite rare. When I counted 1, I also implemented a gender classification so that I could separate the bride’s photos from the groom’s. The prototype worked well on the test set I had and we were ready for a larger-scale test for our client.

Now, do you sense anything wrong? Because if you do, you would have been a better Data Scientist than I was. As I said, our client was an international wedding photo studio, but we failed to consider the cases of same-sex couples.

The algorithm had a glitch when it encountered any couple of the same gender. Since it ran a gender classification to distinguish the bride from the groom, it malfunctioned when encountering 2 brides or 2 grooms.

Algorithm Re-design

To resolve this issue, I used facial recognition instead. With facial recognition, I could command the model to check who exactly belongs to the newly-wed couple and increase the accuracy to a whole new level. Since facial recognition does not involve gender, which is not a simple binary concept, I thought it would have been effective for more cases other than same-sex couples as well.

But the inconvenient thing about facial recognition is that someone needed to manually identify who the couples are and log their faces into a database. If I did not have their info in the database, facial recognition would fail. In a way, facial recognition re-introduced human elements into what was supposed to be automated. Our client didn’t think this was good enough to cut significant costs and did not follow through.

Reflection

It has been one and a half years since that internship, but I still think about this project very often. I wonder why I did not think about more cases outside of my test data, why my team did not realize the potential bias in my design either, and what I could have done differently to avoid the bias from the get-go.

“In order to train AI to benefit humanity, the creators of AI need to represent humanities.” — — Fei-Fei Li

I am scared to say this, but I think if I were to do this again, I could have made the exact same mistake in that situation and coded the biased algorithm all over again.

Sometimes, biases and mistakes are made because of unconscious human choices. And I was in a homogeneous team where everyone else’s brain was wired the same way as mine. Even though I did briefly consider facial recognition instead of gender classification when I first designed my model, I discarded the idea of the more complex facial recognition due to the costs of human labor. But in the end, it turned out that human involvement was essential. After I tried to rectify the problem, it was very difficult to actually automate everything. Humans are always closely intertwined with algorithm implementation. Sometimes the ramification of automation is simply not worth it.

Moreover, I believe Data Science and AI will benefit hugely from a diverse team of practitioners. It is impossible for any single Data Scientist to think the way his/her users think and feel what his/her user feel, but together, different perspectives from different minds stand a better chance.

This experience has kept me extra wary about every single decision in my future Data Science implementation. I truly hope that you can take away something from it too.

Thank you for reading! I hope this has been helpful to you.

PS: I would have used a better image for this blog given the context of this blog, but I could not give away the big twist too early on.

--

--