Making Data Useful

Data science and AI are a mess… and your startup might be making it worse

Cassie Kozyrkov
Towards Data Science
6 min readMar 6, 2020

--

Data science has been called “the sexiest job of the 21st century” but sometimes I wonder whether we’re off by a century here. Is the world ready for us? I’ve looked into this question before, but the tools for data science issue warrants more discussion. The tools available to data scientists put a cap on their effectiveness, so it would be great to see toolmakers paying more attention to their needs. Instead, it feels like the tools are made for buzzwords instead of people.

This article was inspired by my friend Clemens Mewald—one of the best product managers I’ve had the honor of working with—who wrote a piece titled Your Deep-Learning-Tools-for-Enterprises Startup Will Fail …which I read with the same emotion I feel when my toys are about to be yanked away.

It feels as though the tools are made for buzzwords instead of people.

On the one hand, he’s right: if you go about making ML/AI developer tools like all the rest of ’em, your startup will probably go under. On the other hand, I don’t want startup folk to run away screaming.

I say this purely selfishly, as a data scientist pleading on behalf of her people. I recognize my privilege of working in an environment that suffers relatively little from the problems I’m about to bring up, so I want you to know that these words are inspired by my experience of how things used to be (even here at Google!) and by the stories you share with me daily. Let me lend you my voice.

All hands on deck!

It’s no surprise that even the good tools aren’t perfect — there’s more work to do than any data science tool / platform provider (let’s call these entities “toolmakers” for short) could possibly manage alone. The blinding pace of the algorithms research frenzy means that theory innovation far outstrips the capacity of toolmakers to put those mathematical blueprints in the hands of the data scientists eager to use them. What a great problem to have!

The pace of theory innovation far outstrips the capacity of toolmakers to put those mathematical blueprints in users’ hands.

That’s why even though I work for a toolmaker (Google Cloud is very serious about building and integrating great tools for data scientists), let me be the first to cheer on any toolmaker thinking about entering the space. This work needs all hands on deck!

Data science effectiveness as a UX problem

We data scientists spend so much of our effort helping you understand your users that… you forget that we are users too. Which brings me back to Clemens’s (solid) advice to toolmaking startups.

Gorging yourselves on buzzwords again, little buddies? Image: SOURCE.

If you chase the “deep learning” buzz and base your whole strategy on the market’s fascination with one data science tool out of many, Clemens is right. Cue the horror-movie-style foreboding music.

Chase the user. Aim to make tools that don’t make data scientists want to claw their eyes out.

Instead, chase the user. That’s data scientists. And to chase us, you have to understand who we are and how we work. You have to understand what we already have and what we’re missing. Pretty please, don’t just make another tool that implements convolutional neural networks. Make a tool that implements convolutional neural networks in a way that data scientists could use without wanting to claw their own eyes out. It’s a subtle distinction.

Data scientists have been historically powerless to dictate the tools they work with, but I’d bet on an accelerating shift towards data scientists getting a say in the tools they work with as employers fight to attract the strongest talent.

Make it easier for us to do our jobs

Though some of us profess to love doing things The Hard Way, most of us hate doing chores involving minimal cunning and maximal drudgery. As far as most data scientists are concerned, it would be great if all the code we wrote were the same for big or small, laptop or cloud, prototype or production… Deep down inside we know that the only reason that this isn’t the case is that we live in the dark ages where our tools suck.

The part of our job we love is the part that involves data, creativity, cunning, and possibility. It’s the part where we learn something about the universe and share it with you. It is not the part that involves installation and setup and knob twiddling. It is not the part that involved begging a dataset to take a different shape so a function will deign to run on it. And it’s definitely not the part involving getting software packages built in intellectual isolation to talk to one another.

You can help us love our jobs more, and we’ll love you for it!

Clean up this mess!

The reality is that we spend a lot of our day fighting our nasty tools. We know what we exactly what we want to do, it just takes two weeks to do it. Sounds like great UX, right?

Today’s data science tool ecosystem is so fragmented and messy that it might make even Marie Kondo faint.

Today’s data science tool ecosystem is so fragmented and messy that it might make even Marie Kondo faint. If you’re thinking about making more tools, focus on making tools that spark joy. Make it easy to fold them all into one place. (Right, Marie?)

None of today’s tools solves every problem.

That’s what Clemens tells you to do as well, by the way. Don’t build tools for their own sake, build them to fulfill your users’ needs and make your users happy. Focus on integration — it’s important to make these tools play well with the rest of the ecosystem, because no one wants to stop what they’re doing to give your tool special treatment unless it’s a cure-all. As Clemens explains, none of today’s tools solves every problem, so get over that. (If someone told you deep learning is the holy grail, that’s even more reason to read what Clemens has to say).

Focus on integration! No one wants to stop what they’re doing to give your tool special treatment.

To spark joy and make products that fit beautifully into the ecosystem of other tools meeting the needs of your users, you have to understand them and you have to be intentional about UX. That means that your product managers, developers, and user experience folk must take the time to understand data scientists. Advocating for data scientists at Google has been a big part of my role —including helping to push projects that take data science UX serious from day 1 (for example, Google’s What-If tool)— and I’ll make as much of a nuisance of myself outside it if that’s what it takes for us to have nice things.

Hurray for TF 2.0, which is going all-in on usability! There might be teething pains as TF switches over to Keras style and says bye bye to making you write all that boilerplate, but I’m excited. One might even say I’m eager.

The good news is that things are improving. One of my favorite pieces of news in the past year was the user-focus of TensorFlow 2.0 and that community’s commitment to improve on the user pain of TensorFlow 1.x. This is amazing and it’s great that the famous initiatives are leading the way, but wouldn’t it be great if startups took the same stance?

What’s my point?

The data science community doesn’t feel understood and one place we see it reflected is in the tools employers ask us to use. Fixing those tools and making them friendly is not a job for just a few of us, though I salute those amazing heroes who push us all forward against the odds. Alas, very few UX professionals think of data science effectiveness as (at least partially) a UX problem. Worse still, the new tools ecosystem isn’t doing much to encourage a wave of change, often building for buzzwords instead of users. Let’s do what we can to fix that!

Thanks for reading! How about an AI course?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the entire course playlist here: bit.ly/machinefriend

Liked the author? Connect with Cassie Kozyrkov

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.

--

--

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita