The world’s leading publication for data science, AI, and ML professionals.

5 common causes of friction between data scientists and the rest of the stakeholders

Understanding the root causes of friction can help data science teams become their best

5 meaningful ways to improve the effectiveness of the data science teams in your organization

Overview:

As a data scientist, have you ever been frustrated that your stakeholders don’t see the value that you bring to the table? You may ask yourself, "How far should I go in explaining the work I do or what my models are doing?" If that sounds like you, then pay close attention to this post and the next, as they are all about improving collaboration between data scientists and other stakeholders.

This is a two-part post: Part 1 covers the underlying assumptions and gaps (in understanding) that cause friction between data scientists and stakeholders; Part 2 offers concrete steps for better collaboration and improving productivity.

Part 1:

Photo by Sandeep Singh on Unsplash
Photo by Sandeep Singh on Unsplash

Comprehensibility:

Machine Learning (ML) models are inherently complex and hard to explain. Data scientists know that the events between data input and output generation are not easily mapped to an explainable process. Before the adoption of ML techniques, the world was much simpler; it was easier to explain how the output was generated. Back then, decision science operated on a rules-based system; in fact, it still does, even though the rules are now more complex.

Governance:

More importantly, in the world before ML models, the rules were governed by stakeholders throughout the process. (Note that my usage of the word stakeholder is broad…this could be a general manager, business owner, marketing lead, product manager, etc.) __ While this is no longer the case, stakeholders can still have a lot of say in the ML process. For example, stakeholders still hold the key to the input data in many environments. Sometimes they even own much of the process that leads to data output.

Investment:

Further, a lack of understanding can lead to miscommunication, which damages trust between the parties. This lack of trust can seriously impede a data scientist’s ability to provide the right level of support throughout the end-to-end process of model building (i.e., collecting, building, deploying, and iterating on the models).

Oftentimes, management support for resources/time/capital allocation is needed to get better outcomes from ML models. Without such investments, the results of ML models are either subpar or even worse, a waste of time. Remember, a model that predicts with 50% accuracy is of no use.

The above are some of the primary gaps from the stakeholders’ point of view (POV) resulting in misaligned expectations and a lack of trust. While these gaps impact all parties involved, here are the implications, from a data scientist’s point of view.

Lack of support and guidance:

I have heard data scientists express that when the company started setting up a Data Science team or hiring new data scientists, there was so much excitement; however, the support and enthusiasm they experienced at the beginning faltered after a few months or quarters. Of course, not to say all companies are alike, but this pain is most often experienced when companies start the journey towards incorporating data science principles/techniques in their products/processes. Again, if the stakeholders feel they are not getting what they want due to communication issues, data scientists will end up feeling let down and/or neglected.

Misaligned expectations:

Companies often hire the wrong type of data scientists or the wrong level of seniority for a project. This usually happens when the company is getting started with data science and has no clear understanding of what it wants from this team/role/person. This misalignment ends up further souring relationships while wasting time and effort across the board.

The set of gaps listed above are indeed fixable. It takes time, effort, learning, and setting up some frameworks so that both entities – data scientists and stakeholders – can foster better collaboration and achieve more together. Check out some proposed solutions in Part 2.

Part 2:

This part is about improving the effectiveness of the data science team and improving collaboration between data scientists and stakeholders for better outcomes.

Photo by Perry Grone on Unsplash
Photo by Perry Grone on Unsplash

Aligning goals:

Regardless of the specific project, agreeing on the expected outcomes and goals before beginning the work is a best practice. But with the advent of machine learning (ML) models, it’s for both sides to discuss the critical measures of success for the given project. Using a framework such as the Objective Key Results (OKRs) is a great way to approach this process of aligning goals and expectations.

Understanding the problem/business context:

Some companies do this organically, while others don’t. If the data science team is new to the company, they should have a basic understanding of the problem space. Stakeholders must explain what they want to achieve and why it is critical now. Specific circumstances regarding the mode of output consumption is a major part of this discussion. The following two common output consumption scenarios:

a. Consumed by external parties (e.g., an investor making credit decisions, clients taking your fraud prediction into refining their products to end-users, recommendations on websites/e-commerce where the end-user is interacting with the result).

b. Consumed by internal customers (e.g., your sales team could use the lead-scoring model to determine who to call next, decide how much bonus pay to give each sales associate, improve the call routing for your customer support, or drive the right content to the right user on the right channel to improve user engagement).

In all the above circumstances, it’s critical to understand what is possible and what the output is like as well as how the output will be consumed by the recipient stakeholder/team. Stakeholders need to take the time and educate the data teams on the problem context as needed. This step will avoid further mishaps – knowing the context helps the data team ask the right question behind the question and get answers quickly without spinning their wheels.

Sharing sample outcome and usage:

Once the goals/metrics are identified and agreed upon, it’s best to get stakeholders acclimated to the sample outcome before diving into the project. This is not the time to discuss the nitty-gritty details of the model because it’s too early – at this point, you would have simply understood the problem context but not yet figured which ML models work best for your scenario. Additionally, spending too much time ironing out the finer details could mean wasted time and opportunity.

If the output is a probability, a value between 0 and 1, consider providing a range of such values and observe how it translates to business decisions. When you share that the output will be a range between 0 and 1, does that make your stakeholder uncomfortable, or do they understand? If they jump out of their seats, do they ask you more questions or simply disagree? If it’s the latter, go back to the previous step; if it’s the former, help them understand how the outputs could be used by providing options, comparable in the industry. If that’s too much for you, then this is a great time to seek help from a more experienced professional.

Model comprehension:

There are several ways to educate stakeholders on your model without directly explaining its intricate processes. You don’t need to dumb it down, but it’s also fair to say that even you may not be sure which specific path a model is choosing to derive its outcomes. If you are using an ensemble of models, then the situation is more complicated.

So, what can help here? Of course, there are several models and several forms of outputs such as linear regression vs. logistic regression vs. binary classifier vs. RNN vs. CNN, etc, etc. It’s more important to have some go-to ways explaining the model in a way that builds trust and improves mutual collaboration through a confidence-building process. Here are some examples:

  1. Share a well-known use case outside (or inside) your company: How is your use case, and therefore the model, different from that of another (for example, an Amazon product recommendation system). This can ground the stakeholder to take a different perspective as they are not judging you on the more familiar business context.
  2. Explain the inputs and data sources: Bring visibility into what’s gone into building the outputs. This is also beneficial for two other reasons – (a) asking for further investment to refine the inputs, such as collecting more data, new data, higher quality data source, etc. (b) setting the expectations on the work that has gone in beyond model building; sometimes it’s forgotten that data gathering, not the model itself, is the most important part (depending on the context).
  3. Remove chances for misinterpretation: A probability value is not the same as a ranking score. Help stakeholders understand that it’s not simply the output but the chances of error and variance in the output.
  4. Run simulations on the historical dataset: To build confidence, before pushing models into production, you can gain trust by sharing inputs/outputs from the old process on historical data vs. inputs / new outputs from the new process aka ML models. Assuming the results are comparable, then it can bring in confidence.
  5. Host periodic roadshows: It’s a good practice to share the improvements you and your team are making via internal roadshows and presentations. This builds visibility and transparency into your process and further increases the chances of collaboration with the rest of the departments.

Continuous collaboration and improvement:

As with any model-building effort, it is not simply a one-time effort, but rather iterative. If the model isn’t performing as expected from the get-go, keeping the stakeholders in the know can help you get the much-needed feedback loop going. For your model to do its job or to validate whether you picked the right model, you need to feed the learnings from your production systems. This may further require help from your end-users, internal customers, and stakeholders. There may be critical budget constraints in the company, so knowing how much improvement the model can offer is a decision to be made by both the data science team and the stakeholders.

Hopefully the above are some useful principles and steps for the data science team to align with the rest of the stakeholders and start impacting key business metrics. Happy collaboration!


Related Articles