The world’s leading publication for data science, AI, and ML professionals.

Managing Data Science as Products

How data science teams can apply product management practices to solve their biggest challenges

According to an article published by Wharton, the amount of data science jobs have experienced "massive growth – 15 times, 20 times" over the last few years. However, it does not mean data scientists have it easy. With the October 2019 Report from MIT Sloan and Boston Consulting Group citing "seven out of ten companies report minimal or no impact from AI so far", data science teams are under tremendous pressure to overcome these hurdles and demonstrate impact.

In the 2017 Kaggle State of Data Science and Machine Learning Survey, 16,000 respondents identified the 7 biggest barriers for data scientists at work as:

  1. Dirty data (49.4%)
  2. Lack of data science talent in the organization (41.6%)
  3. Company politics / Lack of management/financial support for a data science team (37.2%)
  4. The lack of a clear question to be answering or a clear direction to go in with the available data (30.4%)
  5. Unavailability of/difficult access to data (30.2%)
  6. Data Science results not used by business decision makers (24.3%)
  7. Explaining data science to others (22.0%)

Aside from dirty data, all of the hurdles faced by data science teams relate to organizational and stakeholder management issues.

No amount of Python or R code will solve such challenges. Instead, data science teams need to immediately employ product management practices to engage stakeholders effectively and demonstrate their value to the organization. Below are some simple steps Data Science teams can put into practice

Understand the "Job to be Done"

Image by Author
Image by Author

Every process or methodology for data science or software development requires an understanding of stakeholders requirements. With today’s quantum speed of change in the business landscape, data science teams must extend beyond taking requirements and become thought partners for the business.

By employing Harvard Business School Professor, Clayton Christensen’s "job to be done" framework, data science teams can gain a better understanding of the full business application. This starts by learning from stakeholders what is the situation they are in, what is motivating their needs, and ultimately, what they want to accomplish and what their desired (and measurable) outcomes are. Such understanding can be reached via discovery conversations and potentially utilizing other user research methods.

As an example, a B2B sales or marketing team may ask a data science team to create a lead scoring model to identify top leads based on the firmographic, demographic and behavioral data the company captures. For many data science teams, this would be an ideal request as there is a clear business use case and it is a project that could be executed on with measurable outcomes (quantity of leads, lead conversion, lift, etc.) However, after conducting stakeholder interviews, the data science team may learn that the business really wants to expand margin this year.

Though a lead scoring model could achieve higher margins through better pipeline conversion, there may be other strategies the firm could utilize, such as optimizing pricing. By acknowledging what the business objectives are, data science teams can become true partners by providing valuable insight to their organization and suggest alternative solutions that are of higher value, yet require lower costs and still achieve the intended goals.

By putting on the strategic lens that product managers have in understanding customer challenges, data science teams can become a lot more effective at the organizational engagement challenges that the Kaggle survey respondents highlighted, such as overcoming lack of organization support, adoption and understanding.

Agile Data Product Development

Photo by Bonneval Sebastien | Unsplash
Photo by Bonneval Sebastien | Unsplash

Once data science teams have a better understanding of their stakeholders’ needs, they can start applying Agile methodologies to managing data science as data products. Defining data products is less about the building the best machine learning model, and more about defining how the model will be used. By utilizing the "job to be done" framework and becoming thought partners to the business, data science teams can begin to develop a strong vision for what data product(s) needs to be built.

Data products can materialize as algorithmic features of a website, an interactive dashboard, or other analytics-driven solutions. For example, data products on eCommerce sites may include product recommendations carousel on a product detail page, an up-sell banner on a checkout page, personalized offer recommendations, etc. Behind each data product could be different iterations of models or analytics that the data science team will continue to improve over time.

Once a vision of what needs to be built has been established, the next question is how do you maintain that engagement with the business or product teams and make sure you demonstrate value continuously as you move into development? This is where Agile development is key.

Agile development focuses on iterative development and adapting to changing requirements and solutions as well as following the principles of the Agile Manifesto. There is a tremendous amount of resources available on Agile Methodology and it will not be covered in this article. However, as it applies to data products, there are 4 key things Data Science teams must value to be Agile:

1. Individuals and interactions over processes and tools

Many values of the Agile Manifesto are highly applicable when it comes to data science. Interactions and communication between business, data science, technology, etc. as individuals is ultimately more important than any standardized tool or process. By understanding the "job to be done" for business stakeholders, and being thoughtful partners to other internal teams is extremely crucial to successfully developing data products.

Aside from business stakeholders, data scientists must also engage and communicate with data engineers, product managers, DevOps, MLOps and other stakeholders to form relationships with their peers and understand what is required to accomplish their objectives. By developing relationships across the organization, data scientists will have a bigger picture view on all the tasks needed to operationalize and deploy their models.

2. Model iteration over exhaustive model development

Once a data scientist understands the requirements, their mandate is usually to develop the best possible model. For traditional machine learning models used for prediction, this usually means engineering a broad and diverse set of model features, collecting, cleansing and aggregating the data, then training and validating the model in hopes of a highly predictive model.

However, this process is extremely time intensive and sometimes business stakeholders don’t see any results of model performance until months have passed. In situations where the models need to be deployed in real-time by the technology team, it may turn out the features may be extremely costly to compute in real-time.

Instead, data science teams can employ an iterative design approach. In practical terms, this means first working with the business and technology teams to decide on a subset of model features to utilize. Especially in cases where there has been no model developed, it’s more important that a Minimal Viable Data Product be released faster than it is to try to improve R² by a few points.

This iterative mentality can be applied when there is not enough data to properly train a model. Don’t be blocked by lack of data. Instead, do a simple analysis with the data available and start influencing the thinking of stakeholders with those insights. Afterwards, it’s much easier to build on top of that and communicate that sometimes machine learning models require more data before their value can truly be realized.

3. Stakeholder collaboration over data ownership

As highlighted in the first section of the article, the largest hurdles for data science teams can largely be overcome with better stakeholder collaboration. And as discussed, the solution to this includes:

  • Being a thought partner for the business,
  • Co-designing models with the business and technology partners, and
  • Getting a minimal viable data product released and used so that the entire cross-functional team can start learning from usage metrics and user feedback

One additional challenge for data science teams must help overcome is when ownership of data becomes office politics. Sometimes different teams will view their ability to control "their data" as their source of power. Unfortunately, this type of mentality when shared by multiple business, data science, analytics and technology teams results in data hoarding. This creates silos of data and limits the data science team’s ability to develop great models. The end result are data science or AI projects that have created no value. The only way to overcome such a situation is to make sure all the stakeholders are aligned and are collaborating from Day 1, and that senior sponsors are collectively behind the success of the data products being developed.

Being able to reach across the aisle and align stakeholder needs sometimes also means data scientists need to step into roles when their stakeholders may be lacking certain skills. Sometimes this means training business stakeholders on data science basics, creating visualizations to demonstrate insights from models, prototyping predictive model endpoints that other engineering teams can consume, etc.

In summary, collaboration means breakdown hurdles, doing whatever is necessary to unblock a project, and empathizing and working with all stakeholders to deliver a successful product.

4. Responding to change over following a plan

The pace of change is ever increasing in the business and competitive landscape. If your company is already a laggard in utilizing data and machine learning as a means to compete, it is even more important that the data science team be able to respond to change instead of following a plan.

Let’s go back to our prior example where the data science team recommended implementing a pricing optimization algorithm to help the business achieve higher margins.

Assume the data science team finished developing a model and they were two weeks away from deploying it with the engineering team. However, based on recent shifts in the market, the Executive team was going to change strategy by moving into a broader market with more affordable products at lower prices.

Should this data science team go ahead with the release, knowing that the model they developed tends to increase pricing? No! Unfortunately, some companies may move ahead because in their view, "the ship has sailed." However, a data science team that is truly a partner for the business will be transparent about their findings, and be able to ask for more time to assess how best to respond.

If it turns out pricing optimization is no longer a viable route, through the earlier ideation and consultation process, the data science team should already have a portfolio of other data products that could be deployed. By following an iterative design and Minimal Viable Data Product release mentality, they would lose less time and be able to utilize the same process to quickly release a different data product.

3 Ways to Start Applying Product Management Practices Today

Photo by Braden Collum | Unsplash
Photo by Braden Collum | Unsplash

Changing process and culture is never simple or quick. However, there are three product Management practices that the data science team can do right away:

  1. Capture User Stories – Product managers often capture product requirements in the form of User Stories, which generally follows the format of "as a [user], I want to be able to [do something], so that I can [benefit]." This format is fairly aligned to the "job to be done" framework, which captures customers desires as "when [situation/context], I want to [motivation], so I can [expected outcome]." Regardless of the format or methodology in which data science teams capture the needs of their stakeholders, what is important is to have a very clear focus on the benefit the stakeholders can achieve.
  2. Create a Backlog – Regardless of which specific Agile methodology you end up using, Scrum, Kanban, etc. you should start creating a backlog of User Stories. Aside from the practical nature of capturing notes, a well maintained backlog will start surfacing recurring themes that stakeholders want to achieve. Once certain patterns emerge on types of products asked for, or similar outcomes desired, the data science team can began to develop data products with multiple use cases in mind, and also a prioritized roadmap that addresses large number or higher value needs first
  3. Iterative Releases – As mentioned before, data scientists may have a desire to gather the most data possible, engineer the broadest, most representative set of model features, and develop a model with the highest predictability power. However, this will be all for nothing if the business changes during the time it takes to develop the model. Instead, partner with all stakeholders to define a Minimal Viable Data Product that will get to market sooner, so that the company can learn from actual usage metrics and user feedback. Before doing official customer releases, data science leaders should provide opportunities for their team to demonstrate models or analytics findings to business and internal stakeholders to maintain visibility and gather feedback.

Richard Sheng is the Global Director of Data Science & Analytics at Z-Tech, part of Anheuser-Busch InBev, bringing data-driven technology solutions to small businesses around the world. Richard has 12+ years experience developing data products for startups and Fortune 500 companies.


Related Articles