Avoiding the “Automatic Hand-off” Syndrome in Data Science Products

Simple evolution guidelines for data science products

Moussa Taifi PhD
Towards Data Science

--

A view on the evolution of data science products

Evolving data science products for new teams can be a daunting task. There are conflicting requirements embedded in the nature of data science products. First constraint: product teams want to move proof of concepts to market as fast as possible. Second constraint: data science (DS) teams need a growing infrastructure for efficient experimentation. Data scientists usually lack adequate infrastructure to meet their growing needs. Third constraint: engineering teams (ENG) value system stability and reliability. This makes them unable to keep pace with the “idea generation” coming from the research teams. The combination of these constraints leads to routine delays in machine learning pipelines. Delays can have a serious impact on morale and business value creation. Crucially, this rears its shadow at model creation and model monitoring time. As a community, we need to identify the components and workflows that are prone to delays and failures, to transcend the current state of Data Science/Machine Learning engineering to higher standards.

In this trifecta of teams, i.e. (Product, DS, and ENG), a central question gets asked very early on.

Who should be the primary owner of the data science product computational pipelines?

As the trifecta dwells on this question, the ownership question becomes more refined. More questions such as the following start appearing:

  • Where is the current data science project in the product life-cycle?
  • What is the project trade-off for engineering reliability versus data science flexibility?
  • How much business value is this project creating, and how mission-critical is it?

These questions can help guide the data science and engineering groups. It can help them decide when and how to divide the work of exploration vs exploitation.

Is It Mission Critical Yet?

For this, the trifecta needs to think about the worst case. Imagine we have a week-long snafu and we can’t fix anything on the model or the data pipeline supporting it. What would be the impact? Here is a ranked list of scenarios:

  1. The model is returning bad predictions but there is no loss of business value.
  2. The predictive power of the model is becoming stale. This leads to a slight business value loss.
  3. The predictions of the model are so wrong that we are losing a major amount of business value.
  4. The predictions of the model are damaging our customer relationships. The model is triggering major business value loss and is impacting the brand.

With this simple list, the teams can give a Mission Criticality score to their projects. This can help them look at their options for the DS to ENG interactions.

Without this Mission Criticality Score evaluation, risk-averse engineering groups commonly suggest a full hand-off of the project. Right after the first model iteration. Too soon.

Let’s call this the “Automatic Hand-off” Syndrome in data science products.

From Mission Criticality score to DS pipeline

Here is a pattern to use to avoid this automatic hand-off. I saw this work this across the AI projects I worked on in the recent past. After determining the mission criticality, the team can choose one of the two paths below.

Two different methods of ML product management based on mission criticality

For non-mission critical products, the DS project owner should promote full DS ownership. That way, the data scientists that are leading this product can also own the DS pipeline:

Two different methods of ML product management based on Mission Criticality

For less mission-critical products, the DS project owner should promote full data science team ownership. That way, the data scientists that are leading this product can own the DS pipeline:

  • Data preparation
  • Feature engineering
  • Model training
  • Model serving
  • Model monitoring

To succeed at this, a data science platform team can provide light “platform” level support. The DS team can depend on the upstream data pipelines. These are usually available for reporting functions in the parent business.

On the other side of the equation, the mission criticality of the projects can become too high. Significant levels of business value can be lost because of in-flight turbulence. In this case, it is advisable to adopt a complementary scheme. The engineering builds a hardened pipeline for production usage. This deals with “stability-first” data preparation, feature engineering and model training/serving/monitoring. The DS team keeps on developing new feature engineering/models in parallel.

It is useful to keep the serving/monitoring decoupled from the production pipeline. The experimental DS pipelines can use the serving/monitoring infrastructure used by production workloads. This also prevents the occasional divergence in model serving compatibility.

This simplifies future “hand-offs” when the product grows to be mission-critical enough. The “pseudo-invariant” serving layer encourages the team to seek models compatible with it. A great opportunity to use the Dependency Inversion Principle. The serving layer interfaces between the consumer/client queries and the predictive services. But, it can serve to build a contract between teams. Both the experimental and production pipelines can share this service. The drawback of this strategy is that it reduces the ML search space. Both API contracts and DB table schemas can serve as contracts. This keeps the experimental and production pathways in sync.

Two more questions stay unanswered:

  • How to divide work between DS and ENG team members to enable them to use their complementary skills?
  • When and How to evolve this interaction between the experimental and production level?

Dividing work between DS and ENG

Here is one method to model the DS vs ENG interaction, split into two complexity axes:

  • Computation Performance Complexity
  • Machine Learning Complexity

“Computation Performance Complexity” (CPC) is a wide abstract term. I will use it here to describe any processing that does not “fit” on a data scientist’s laptop.

“Machine Learning Complexity” (MLC) is also a wide and abstract term. I’ll use it for any processing that needs extensive knowledge of machine learning.

The goal is to help DS teams decide how to divide various project components. DS products can remove some of the unnecessary couplings. This requires decoupling the CPC from the MLC. We explain how to split the components based on the specialization of DS and ENG teams.

Possible Data/DS/ML Component refinement directions

The idea with this split is to break it apart in meaningful components. The concurrent implementation of the components has sped up the time to production.

Early in a project, there is not enough ML or performance complexity in the problem at hand. But everyone gets involved by default. This method adds a good deal of communication overhead inside the team. Data engineers can manage specialized low latency and big data stacks. Data scientists and ML engineers can manage machine learning components.

The separation of concerns using this framework is an art. This framework can guide the initial attempts to build modular machine learning pipelines.

When and how to evolve experimental work to production levels

ML Evolution pathways

Monitoring is key to the data science product evolution. DS owners pay close attention to the business value creation of live models. The diagram above proposes a method for a simple feedback loop. It is a simple map for making the jump to full production support.

DS teams can use the decision chart above to select a strategy. The diagram helps determine the mission criticality of the pipeline over time. The DS team can choose the experimental pipeline or the full production support. In the experimental pipeline, data science ownership is for the complete pipeline.

The experimental pathway provides full flexibility. This includes generating many versions of the feature engineering and model training steps. At the egress of the pipeline, there are two types of DS monitoring. “Predictive performance” and “Business value” monitoring. The predictive performance guides the next iterations of the DS team. This allows them to improve their model/feature engineering techniques. The business value monitoring guides the evolution from an experimental to a production mode. The DS team can keep on iterating while they build a solid case for full production support.

In experimental+production mode, engineering owners replicate the best performing ML recipe. This recipe comes from the experimental pathway. They harden the modules that have issues with performance or reliability constraints.

This dual-mode allows the DS team to keep on iterating. Modifications to the existing pipelines produce the next best recipe. This next best recipe aims to become the production pipeline. As in the experimental mode, the DS team uses the predictive performance to iterate. The business value monitoring determines the need for further production hardening. The DS projects are full life-cycle projects. As such, we can use the business value metrics to determine any sun-setting. This is necessary to make room for future DS products.

Conclusions

Delays and painful debugging of ML pipelines are costly. Part of the problems is the Product + Data Science + Engineering trifecta workflows. All these individual parts work well in isolation. But the communication overhead slows down projects. This blog post explained the forces that lead the “Automatic Hand-off” syndrome. DS teams hand over projects to engineering from day one, for implementation. This method has drawbacks on the evolution of the machine learning product.

To counteract this, we described 3 simple guidelines. This is to help deal with this kind of premature optimization. First, we described how to approximate the mission criticality of a project. This includes how DS pipelines can adapt to support both ends of the spectrum. Second, we described guidelines on how to divide up a single complex DS application. We used two axes of complex performance and complex machine learning. Finally, we described a method to track and adapt to the experimental+production mode. Using the business value monitoring we can adjust the ML pipeline mode of operations.

That’s all folks.

I hope you enjoyed this post. It aims at helping you grow your machine learning project architectural design skills.

Please note that the opinions expressed in this post are my own and not necessarily those of my employer.

We are hiring! If this interests you please check out our open positions at Xandr Data Science Platform Engineering:

https://xandr.att.jobs/job/new-york/data-science-platform-engineer/25348/12859712

--

--

Senior Data Science Platform Engineer — CS PhD— Cloudamize-Appnexus-Xandr-AT&T-Microsoft — Books: www.moussataifi.com/books