Advice from a product manager to product managers working with data scientists.

Experience learned from the trenches vetted by data scientists

Shaw Li
Towards Data Science

--

Data-driven decisions. The buzz word of the day — used by many product managers and companies. To prove their chops, companies sometimes pair PMs with data scientists to figure out ways to improve a product and make “data-driven decisions”. But how does a PM work with data science?

Source: Unsplash

What is data science?

Data science is an interdisciplinary field that uses techniques learned from computer science, statistics, and scientific research to extract actionable information via experimentation and predictive insights by analyzing structured and unstructured data. That’s a mouthful so let’s unpack it.

interdisciplinary field […] computer science, statistics, and scientific research

Data scientists need a combination of skills. For example, processing and organizing large amounts of data require computer science skills. Creating mathematical models to predict something requires understanding statistics. Generating hypothesis and testing those hypothesis uses the scientific research method.

extract actionable information via predictive insights

It’s all about taking data and figure out a way to make automated actions or decisions. Data scientists accomplish this by creating statistical models that has errors “less or on par to human decision making, [which] can be implementation at scale,” says Arvind Venkataraman Ganesan, at Facebook.

experimentation

Set up tests to prove causation, not correlation. Data scientists use scientific research methods to help prevent drawing the wrong conclusion.

Got to destroy those pirate ships to stop global warming. Source: Author

structured and unstructured data

Obtaining, organizing, and preparing data for analysis is important part of the data scientists role, made even more important for unstructured data (e.g., video, images, or text), which requires specific data engineering skills to collect, process, and store data.

The difference between data science and data analytics?

A common confusion occurs for PMs when a company has both data science and data analytics roles. What’s the difference?

  • Data analytics has been around since at least the 60s, expanding with the introduction of personal computers; Data science evolved from data analytics starting in the 2000s, made possible with the explosion of data and tools, techniques, and cheap processing power to analyze and manipulate data at scale.
  • Data analytics traditional looks at the available data to tell you the what (e.g., % of sales change YoY); data analysts are responsible for generating and maintain standard reporting, ad-hoc business or data analysis, clean and normalize structured data. Data analytics answer known or prompted business questions using structured data.
  • Data science looks at the data to tell you what to do (e.g., running more of this type of ad) given the what; data scientists also clean and normalize data, but handle larger data sets of structured and unstructured data, responsible for defining and creating experiments to prove causation, build statistical models to automate decisions, and provide insight to address questions the business may not know to ask.

Both roles are important and it shouldn’t be view as one is better than the other.

Source: Unsplash

Tips When Working with Data Science

Don’t work with your data scientist in the same way as a data analyst. This doesn’t mean that a data scientist can’t do data analysis, but if you’re trying to create predictive insights, then you need to work with the data scientists to shape the goal, problem, and hypothesis. Just as product managers don’t want to be treated solely as project managers, you don’t want your data scientists to be there just for running SQL queries or reporting.

Understanding the backgrounds of your data scientist helps you build product teams to complement his or her weakness. Data science is a relatively new field and the people who are data scientists come from different backgrounds, each with its own set of strengthens and weaknesses.

  • Those with stronger computer science backgrounds may be great at data engineering, but have less statistics or applied math skills and less familiarity with data visualization and presentation.
  • Those with stronger statistics and scientific research backgrounds may be great at designing and running rigorous experiments, but unable to adjust to the timing constraints of a business or be able to extract actionable information for decision-making and use in production. For example, at Instagram, it has decision scientists (i.e., data analysts) that support data scientists.
  • Those with stronger data analytics skills and have good business context may lack the computer science and statistics background to know how to build predictive models and properly test hypothesis.

Determine if your problem can be solved by data science within your given timeframe.

Source: https://xkcd.com/

Some business problems can’t or shouldn’t be solved by data scientists due to a variety of issues:

  • data issues (e.g., lack of training data, not the right data, poor data quality),
  • poor data infrastructure, tools, or people will the right skills
  • ethical issues (i.e., should we do something or not),
  • legal or regulatory prohibitions, or
  • inability to clearly explain the model.

Even if you solve all of the above, your might have a time limitation. To create the predictive insights that can then be automatically actionable, you also need high degrees of confidence and consistency. That may require pre-work to collect, clean, and organize the data, adding additional time to the project. You may not have this time to invest.

Good data is imperative. As a fellow data scientist told me, “Typically the [lack of] training data is what causes a lot of issues of whether a problem is solvable to an acceptable quality.” For example, if you’re building a chatbot, but you don’t have any chat logs, how are you going to training your bot to respond to different human requests. How are you going to collect or create this data? You may have to start with figuring out how to obtain training data before you can even tackle your business problem.

Use hypotheticals to define what’s acceptable. Assuming you have good data in sufficient quantity, there still isn’t a guarantee that the data scientist can build a predictive model that meets an acceptable quality. For example, if the problem is how to improve the speed for loan approvals, would the business be okay with decreasing the approval time from 2 days to 5 minutes, but at the expense of rejecting incorrectly 10% of loans that was previously approved by a human? Thus, define what’s an acceptable quality level upfront. Work with a data scientists and identify some hypotheticals outcomes on what the performance of the predictive model might be and what trade-offs are acceptable to the business.

Incorporate your data scientists into the product team. Too often, data scientists are treated as a shared resource among multiple product teams and given a problem to solve on their own. If a model is developed, only then is it’s time to involve engineering to put it into production. This may work at larger organizations. However, incorporating a person doesn’t mean he or she just joins your standups. If the data scientist needs help with data organization or cleaning, have him work with another engineer on the team. If the data scientist needs to ideating on the problem space, you should be first in line to discuss.

Special thanks to the following for reading, editing, and providing feedback.

[Original published on 11/5/2020 on my substack, The Elements of Product Management. Follow me there to get a weekly newsletter delivered to your mailbox.]

Sources:

--

--