Welcome to the Data Team! Please Solve Everything. (Part III: The Solution Ignored)

Data products aren’t magic elixirs — because stakeholders are more like library patrons than the infirm.

Published in

Towards Data Science

7 min readJul 11, 2020

This post is the last of three in my “Welcome to the data team! Please solve everything” series. In the first, I shared common misuses of data within organizations that result from overconfidence or naivete from non-data folks.

Welcome to the data team! Please solve everything. (Part I: The Problem)

Data products aren’t magic elixirs.

towardsdatascience.com

In the second, I propose the solution to this problem is giving somebody the responsibilities of a data superlibrarian — more commonly known as a data translator, data strategist, or data product manager — who determines which existing data products are best suited to accomplishing a business’s goals and also has the power to commission new ones if needed.

Welcome to the data team! Please solve everything. (Part II: The Solution)

Data products aren’t magic elixirs — but please drink responsibly anyways.

towardsdatascience.com

The data superlibrarian decides whether these products should rely on preexisting or not yet generated (e.g. through a survey) or acquired (e.g. through a third party) data. In spite of the importance of the responsibilities of this role, most data science and analytics articles, talks, and coursework focus on learning and applying statistical or predictive techniques. While mastering programming languages and algorithms is important, having somebody in your larger data organization tasked with responsible and effective use of data is critical for success. Below are the attitudes that are preventing the data superlibrarian from getting more traction.

Organizations don’t realize the role exists

A medieval fool — By Mummelgrummel — Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=25796381

Let’s consider a common situation. Business leaders and the data team are having a long meeting to solve a key problem for the business, and both sides leave dissatisfied. The business leaders are frustrated that the data team can’t answer simple questions, explain their proposed solutions, and sign up for delivering concrete results. The data team is frustrated that the business leaders oversimplify problems, don’t understand simple statistical concepts, and frequently change the scope of the problem that needs solving.

Whether or whether not it’s admitted out loud, both sides are resigned to having unproductive interactions with the other. “That’s just the way they are,” members of both groups claim. Because of this attitude, neither side recognizes the need for a go-between, and additionally, their attitude prevents them from realizing that—with a change of mindset and some training — the person filling this role doesn’t need to be an outside hire, it could be one of them.

It’s also a situation in which we don’t know what we don’t know. Unlike idling machines on a factory floor or shipping defective products, obvious problems, it’s not apparent to anyone involved that the organization is not using data effectively.

Organizations don’t think they need the role

It’s generally apparent who is responsible for decision making at companies. The most impactful ones are made, or at least reviewed and approved, by the CEO. The next tier of decisions, still important and likely requiring domain expertise, are made by the rest of the c-suite. As we move down the chain, we get more specialized and less impactful. This makes sense; the more impactful the decisions have the potential to be, the greater the resources and authority required by the decision maker to choose correctly. If somebody at the bottom is having a disproportionately large impact on the company due to savvy decision making, they’re overdue for a promotion!

What about the decision making process? Who’s responsible for that? Here I suspect most organizations are unquestioningly operating under the following assumption:

The people in charge of decisions *should also* be in charge of requesting data products that best help with making that decision.

By hiring data superlibrarians to determine the most appropriate data products to inform decisions, decision makers are giving up a responsibility they probably consider part of their jobs, one that they may not necessarily be doing particularly well, especially if they don’t have a strong background in responsible applications of data. These executives are unwilling to believe that anybody else, even a skilled specialist, can improve their decision making processes.

Icarus falling from the sky, this is a metaphor for hubris — By Jacob Peter Gowy — http://www.museodelprado.es/imagen/alta_resolucion/P01540_01.jpg, Public Domain, https://commons.wikimedia.org/w/index.php?curid=27493281

Organizations assume that their data team management is already doing the role

I’ve found that a binary attitude towards data among business leaders often underlies this assumption. In the third example in my first post in this series, I talk about a hypothetical executive who views using data to support his decisions as an inconvenience. The quality of data relative to its intended purpose is unimportant as far as he’s concerned; this executive just needs to fill a deck with enough relevant seeming datapoints to secure approval for his projects.

I’ve seen this same binary attitude applied to those who work with data. The assumption is that if you’re on the data team—which often only requires a basic knowledge of SQL, Python, R, Tableau, or Excel—you always know the appropriate or optimal data product to support a given task. We can see why they might have this perception. If you don’t have much experience with effective data use yourself, yet you see it being applied almost universally, often with incredible success, it looks like a magic elixir for businesses. And if data products are magic elixirs, it means those who create them must be sorcerers. You don’t meddle with their arcane potion making processes; you stand back, give them the resources they need, and unquestioningly quaff whatever they hand you.

Morgan le Fay — By William Henry Margetson — http://d.lib.rochester.edu/camelot/image/margetson-she-was-known-to-have-studied-magic (cropped), Public Domain, https://commons.wikimedia.org/w/index.php?curid=69727077

As much as all involved would love for this to be the case, I’m going to put my potion cookbook back on the shelf, remove my cloak, and explain why it isn’t. According to Angela Bassa, Head of Analytics, Data Science, and Machine Learning at iRobot, “[m]any managers of data science teams become managers because they were great individual contributors.” We can probably agree that these battlefield promoted data science managers are strong thinkers, after all, how else did they earn their promotions, but that does not make them immune to bias (psychological, not statistical).

Most data scientists I’ve worked with developed their chops on internal, user-generated data, and for good reason. The details of how it’s collected are documented. There’s an abundance of it. Once the data pipelines are put in place, it’s practically free. And insights that can be gleaned from a set of user behaviors are directly applicable to the users that generated them.

But how does one judge the relative usefulness of other data? When it comes to developing new features, does one rely exclusively on user segmentation analysis on internal data to determine which behaviors should be encouraged to increase overall engagement? Does one survey users on potential features, knowing full well that the data collected can be unreliable? If one does both the internal and external analyses, how does one weigh the results of each? In a world in which most data science management is most comfortable using large, internal, user generated data to make recommendations, does the survey data receive appropriate weight? Is it even raised as an option? Data science management isn’t incentivized to talk about survey data if they aren’t confident on how to effectively integrate it into their analyses. Advertising your weaknesses is not a winning strategy. Traditional data science management are not the superlibrarians we’re looking for, as they’re more familiar with some sections of the library than others, and therefore more likely to misinform their patrons.

A Hopeful Conclusion

It turns out that the discipline that we need our data superlibrarians to have exists! It’s called Decision Intelligence, and has been developed at Google since 2018 under Cassie Kozyrkov, their Chief Decision Officer. In doing research for this piece, I learned that she’s written about the issue of data science leadership — in addition to everyone else— not being adequately trained to improve decision making and maximize the effectiveness of data products. Being a nascent discipline and developed internally (and named differently) at places like Google, McKinsey, and Instagram, there currently aren’t a lot of resources available to start learning it.

Hype cycle with “data science without decision intelligence” at the peak and “decision intelligence” at the beginning — By Jeremykemp at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=10547051. Edits made by author.

That’s demoralizing news if you want a neatly packaged solution to roll out at your company right now, but presents a massive opportunity if you want to help develop this discipline and democratize its best practices. I, for one, am excited to view all my personal and work data projects through a decision intelligence lens, put extra effort into assessing effectiveness of different techniques and frameworks, and share my findings. In doing so, I’m answering the challenge issued by Kozyrkov in 2018, to “join [her] in recognizing [decision intelligence] as a discipline in its own right and [generate as well as] shar[e] our wisdom as widely as possible.” I hope my series of articles provides sufficient data for you to make the same decision.