Data-Driven Leadership and Careers

Why AI and decision-making are two sides of the same coin

Bonus: A demo that shows why you shouldn’t treat AI like a magical box of magic

Cassie Kozyrkov

Published in

Towards Data Science

8 min readFeb 28, 2020

Note: all the links below take you to other articles by the same author.

With all the gratuitous anthropomorphization infecting the machine learning (ML) and artificial intelligence (AI) space, many businessfolk are tricked into thinking of AI as an objective, impartial colleague that knows all the right answers. Here’s a quick demo that shows you why that’s a terrible misconception.

A task that practically every AI student has to suffer through is building a system that classifies images as “cat” (photo contains a cat) or “not-cat” (no cat to be seen). The reason this is a classic AI task is that recognizing objects is a task that’s relatively easy for humans to perform, but it’s really hard for us to say how we do it (so it’s difficult to code explicit rules that describe “catness”). These kinds of tasks are perfect for AI.

The demo

I hear groans from those of you who have been around AI for a while — you are so sick of the cat/not-cat task. Fair enough, but humor me just this once. For this exercise, you will be my AI system. Your job is to classify the six images below. You may choose from only two allowable labels/outputs:

Cat
Not-cat

Let’s do it! Assign an allowable label to each of the 6 images, just like an AI system would:

Take a moment to label each of the images as “*cat*” or “not-cat”.

A-ha! Images 1–5 were easy, but what’s that I hear you mumbling for image 6? “Big cat”? “Sort-of-cat”? “Maybe cat”? Those aren’t allowable options! You’re a system that is programmed to only output “cat” or “not-cat”. So which one is it?

Go to 0:20 in this video to watch this article’s demo with a live audience.

And thus, we begin to see how important the project’s decision-maker is. The right answer isn’t platonic and it certainly doesn’t come from the AI system… in fact, there is no “right” answer. The “right” answer depends on what the owner of the system wants the system to do.

AI cannot set the objective for you — that’s the human’s job.

If I’m trying to build a pet recommender system which only suggests creatures which are safe to cuddle in their typical adult form, then the answer becomes clear. My intended purpose for this system means the correct action for you is to label image 6 as “not-cat”. If you’re still labeling it as “cat” at this point, well… I recommend you take out more life insurance.

Machine learning’s “right” answers are usually in the eye of the beholder, so a system that is designed for one purpose may not work for a different purpose.

If you’re intending to classify cats for some other purpose, then maybe the “right” answer will be different. The purpose or objective comes from the human decision-maker! Step aside Plato; different answers are going to be appropriate for different projects. In AI, the objective is always subjective. It’s up to the owner of the project to call those subjective shots. (And up to everyone else to understand that AI systems have a lot of subjectivity baked into them.)

In AI, the objective is always subjective! AI systems have a lot of subjectivity baked into them.

Decision-makers must serve as responsible parents who pick the behaviors they want their systems to replicate… and there’s almost never a single “right” way to define categories and set objectives that every decision-maker would agree on. Those things are up to individual people. Different people will find different sets of behaviors appropriate for replication.

If you inherit my system and your intentions are different form mine or if you plan to use it for a different purpose than the one I designed it for — for example, if you have different views from mine on what should be called a cat — you may find that my system does not work for you. It might even hurt you, though it made me perfectly happy.

If that happened, the fault there would be yours, not mine; you were enough of sucker to assume that there’s only one way to define things. You thought that a system with a mathematical component couldn’t possibly have ambiguity and human foibles baked in, so you ended up with a great solution to the wrong problem because it wasn’t your problem (it was mine).

You should always test an AI system that someone else developed, especially if you don’t know how they defined their objectives.

Am I saying that you can’t use systems developed by other AI engineering teams and that you have to build your own from scratch every time? Not at all. However, you do need to form your own clear idea of what it means to correctly perform your task (e.g. what to do if there’s a tiger) and you need to carefully test the system you’re thinking of inheriting on a battery of your own examples (such as photos that you’ve hand-labeled).

What is ground truth?

You might have heard the term “ground truth” rolling around the ML/AI space, but what does it mean? Newsflash: Ground truth isn’t true. It’s an ideal expected result (according to the people in charge). In other words, it’s a way to boil down the opinions of project owners by creating a set of examples with output labels that those owners found palatable. It might involve hand-labeling example datapoints or putting sensors “on the ground” (in a curated real-world location) to collect desirable answer data for training your system.

Newsflash: Ground truth isn’t true.

For example, a set of images might be painstakingly hand-labeled as cat or not-cat according to the opinions of whoever was in charge of the project and those cat/not-cat labels will be called “ground truth” for the project.

What on earth is this?! Cat or not-cat? Watching the trailer of the movie Cats tempted me to bleach my eyes.

When such a dataset is used to train ML/AI systems, systems based on it will inherit and amplify the implicit values of the people who decided what the ideal system behavior looked like to them.

When we create machine systems based on data, we teach them a sense of our values.

While we’re on the topic, please be aware that creating “ground truth” by asking trusted humans to perform your task is subject to all kinds of errors, including human error. It’s a good idea to try to minimize the potential for this error through approaches like consensus-based data collection workflows, reaction time monitoring, and clever user experience (UX) tricks that reduce the likelihood of data entry mistakes. (More on these in a future article.)

It’s always a good idea to have your project’s decision-maker review a random sample to check that the quality is high enough.

What if you find a dataset on the internet and use it instead of collecting your own? Then your project inherits the implicit values and biases of whoever made your dataset. There are always subjective calls along the way and whoever makes them determines “right” and “wrong” for your project. Be careful whom you trust! There’s a lot to be said for decision-makers taking the time to perform the task themselves to better understand the problem they’re trying to solve, along with the data, the objectives, and the edge cases.

Warning (how to be a good citizen)

I’ve written a lot in praise of testing ML/AI systems carefully, but beware! Since the entire process fundamentally invites subjectivity of definition and objective, all testing will be done in light of the answers your team’s decision-maker liked. Unfortunately, there’s no test for the stupidity of those subjective bits in the first place. There are no checks and balances on the decision-maker except other decision-makers reviewing their rationale for choices made in the first phase of the project.

There’s no test that checks the stupidity of subjective definitions and objectives, so choose your project leader wisely.

Now that you know just how subjective the first part of all ML/AI projects is, you can be a better citizen in an increasingly AI-drenched society. Instead of taking AI solutions at face value, always ask yourself:

Who built this system?
What were their (subjective) objectives?
How did they define the right answers?
Would most people come up with similar definitions?
How is the ground truth dataset created?
Which people is this system intended to benefit?
How painful could mistakes be?
Are there appropriate safety nets built in? (Did the system creators have the humility to anticipate the possibility that their choices might be unwise and plan accordingly?)

In many situations, your responses to these questions won’t reveal anything scary. AI is already all around you and for the most part it’s well-designed and nutritious. Alas, occasionally you’ll find yourself in troubling waters. For example, you wouldn’t want to fall victim to a myopic fraud detection system with sloppy definitions of what financial fraud looks like, especially if such a system is allowed to falsely accuse people without giving them an easy way to prove their innocence. That kind of thing is a powder keg begging for trouble. As a responsible citizen, it’s up to you to notice flammable situations and call them out. Once you start seeing the subjectivity inherent in the AI game, you’ll be better armed to call out the ugly human elements that could get amplified if no one is watching.

For a guide to wisely working through the squishy subjective bits, see my Ultimate Decision-Maker’s Guide to Starting AI.

Thanks for reading! How about an AI course?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the entire course playlist here: bit.ly/machinefriend

Liked the author? Connect with Cassie Kozyrkov

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.