To Make the Most of Your Data Scientists, Embed Them

"I have all of these data scientists," I hear you muse, "who I’ve spent gajillions to recruit, hire and retain, with fancy degrees in statistics, computer science, and one with a, ‘really quant-heavy master of public policy’ who absolutely crushed our take home coding exam. How do I find the best things for them to do? How can I best herd these nerds?"

Don’t worry, your favorite extroverted nerd’s got your back. No size fits all, but this is one way I’ve found to be very effective in sourcing high-value data science projects.

A lesson from wartime journalism

Many folks who see my résumé react with a sort of bemused puzzlement that I’m a working AI geek with a masters degree in investigative journalism. "You must know how to write!"

🤔 🤞

The truth about an education in journalism, though, is that the primary skills it imparts are listening to learn information, and combining those pieces of learned information to derive narrative and context. Those skills are very helpful to data scientists, and the work of war reporters suggest a very helpful model for our work. (For clarity, while I am a combat veteran with an MS in journalism, those two worlds have never met in my case and I’ve never worked as a war reporter. If you have and you find that I’ve misrepresented your experience in the rest of this post, please do reach out and educate me.)

An embedded reporter serves a very helpful purpose in the service of truth: they are embedded in a unit — e.g. a deployed infantry platoon — in the sense that they follow them around, see what they see, hear what they hear, and observe their experience close hand. They are free, within the bounds of certain agreements with their host, to go wherever and speak to whomever in support of that purpose. But importantly, they do not report to that unit in any meaningful managerial sense. That platoon leader is not their boss. The reporter in the unit share a very specific partnership, with specific responsibilities to one another. A unit is responsible for the reporter safety; the reporter is bound by agreements, for example not to disclose or publish information that would put their host unit or civilians at risk. Instead, the reporter reports, in a managerial sense, to some other professional journalist or editor, to whom they are ultimately accountable for the quality of their work.

To draw a helpful analog to a Business unit, consider one model where an individual contributor data scientist is directly assigned to report to a manager of some other job function, and another where they are embedded in that same business unit, but in a very specific sense: They are welcome to and free to attend that business unit’s standing meetings, learn and ask questions, and come to a genuine understanding of that unit’s business problems. But the responsibility and accountability for choice of project remains in the hands of data science leadership.

Why might this be preferable?

First, it frees already busy business leaders from managing specialists with a set of skills they may not fully understand. No shade intended, but remember that many data scientists have invested several years cramming a graduate education in statistics, computer science or another quantitative field into their heads. It’s difficult for others to know what is easy, hard or impossible for them to do, or how specific a problem must be to be DS-ready. "Is ‘improve this KPI’ a specific enough ask for a data scientist?" becomes, "I’ve been curious how we can move this KPI, can I tell you about it and pick your brain on how we might leverage your skillset?"

Second, it makes data scientists accountable for understanding business problems. A DS lead goes from saying, "My expectation is that you deliver on whatever models the product owner requests," to "My expectation of you is that you come to understand your embedded partner business in its entirety, and source opportunities for us to add value, so that we can make wise choices among them. My expectation of them is that they be open, welcoming and transparent in bringing you up to speed on it. And my expectation of myself is that, at minimum, I clear any obstacles and support you in achieving those goals. What are your expectations of me?"

Finally, it empowers creativity and agency in your data scientists. In my experience, the magic often happens when a data scientist hears a problem and mentally connects it to a dataset she knows is available or a method she’s familiar with. This often creates opportunities a vertical business leader wouldn’t ever think to ask for, simply because their role doesn’t involve life in a firm’s data landscape. Data scientists do live there, giving us a unique position from which to judge feasibility and time to market.

Why might this be a terrible idea?

This aims for optimal – in the sense of fast, comprehensive, and transparent – opportunity sourcing. If your DS team(s) have no bandwidth for new items, then effort on sourcing could be a waste. Embedding solves for this by making data scientists free, but not obligated, to attend those standing team meetings where they’d listen and source. No extra rocket surgery required, just old-fashioned, straight-forward communication of current priorities on why you won’t be in attendance. Nevertheless, big, long-timeline projects aren’t the best fit for this method. If a full team is heads down for months collaborating on elaborate, deep learning models for a planned product launch (been there), this framework will definitely optimize for the wrong criterion.

Finally, it isn’t for every data scientist. The extra stakeholder engagement isn’t what drives every DS to get in the game, and for those who want to be tackling thorny problems on eight daily hours of deep work, putting scattered invites in the middle of it is a recipe for poor retention.

How can I get started?

If you’re a Data Science stakeholder:

Invite data scientists you already work with to any meetings, standing or one off problem-solving, to listen or contribute.
Create an atmosphere that welcomes their pitched ideas, and resist the urge to respond, "This is not what I asked for."
In the same way that an agile product owner consults technical partners about the LOE for a story before setting priorities, approach your data science partners with curiosity and how you could make it work easier and better.

If you are a data science lead:

Make sure your team has a place to record, with comprehensive transparency, all of the opportunities that they’ve sourced, and a light weight recurring way for your team to choose among them.
Be upfront and transparent with your partners from other functions about what you’re asking them for, and what you expect of them. And it’s often helpful to say, "My expectation of an embedded data scientist is not that they do everything your function needs, but that they understand all your needs and do the work that is to our comparative advantage."
Be upfront and direct about which sourced opportunities your team will action, or will not.

If you’re an individually contributing data scientist:

Bring that same spirit of curiosity to any business folks you collaborate with. Share with them your desire to understand their business, so that you can better add value to it.
Fight any residual imposter syndrome, any little voices telling you that "understanding business isn’t for us data people." Again, you’ve squeezed the central limit theorem and two or three programming languages worth of syntax into your head: You can grok revenues and expenses just fine.