The world’s leading publication for data science, AI, and ML professionals.

A view on machine learning operations infrastructure

The reality after the notebook: How to develop a robust framework for ensuring control over machine learning operations

Generating a working (value-generating) machine learning model is not an easy task. It usually involves advanced modelling techniques and teams with scarce skills. However, this is only the first step on an even more complex task: deploying the model into production and preventing its degradation.

Even being alleviated by the cloud shift, at least two-thirds of IT spent is still concentrated on maintenance-mode tasks. There is still little research about where this split holds for ML related projects or not, but my take is that this percentage will even increase significantly due to the fact that an ML workload has more "liquid" inputs and fewer control levers as shown below:

Figure 1 - Impact of variability and control in ML workloads maintenance
Figure 1 – Impact of variability and control in ML workloads maintenance

In essence, maintenance is mainly driven by the level of variability and control we may have over the different components on the system. As shown on the diagram, it is reasonable to conclude that Machine Learning workloads are more prone to maintenance tasks. To make things even worse, the evolution paths of data and code (business roles) do not necessarily need to be aligned. This is greatly explained in depth in Hidden Technical Debt in Machine Learning Systems

It is reasonable to conclude that machine learning workloads are more prone to maintenance tasks

It is absolutely necessary then, to develop a robust framework for ensuring control over the Machine Learning operations once our model is deployed into production, and at the same time ensure the quality of the models and its evolution is not compromised.

The science (art) of developing models is a well-studied field and there are even industry reference frameworks for model development such as CRISP-DM , specific EdA methodologies, so for the rest of the article we will assume we have an already trained model with acceptable performance.

What infrastructure do we need for running Machine Learning at scale?

In a nutshell, there are three big platforms we need to engineer, apart from of course the development platform where building the initial model, running experiments and so on and also another cross functional platforms such as code repositories, container registries, schedulers or monitoring systems. The platforms are depicted in the following diagram:

Figure 2 - The fundamentals ML platforms components
Figure 2 – The fundamentals ML platforms components

Inside the feature store

In essence, a feature store decouples the feature engineering process with its usage. This is especially useful for situations where the input data is subject to complex feature transformation logic or where one feature is used by many models, in those scenarios a feature store is an excellent component to engineer since it hides complexity and promotes reusability. However, there are certain scenarios where we can skip this component, for example where the data used for training the model is in its natural state or where the model itself incorporate feature generators (e.g convolutions, bidirectional or embedding layers).

Figure 3 - Components inside the feature store
Figure 3 – Components inside the feature store

The feature store is comprised of a number of elements:

  • Ingestion: This component is responsible for loading the raw data inside the feature store storage. Batch and online ingestion paths should be supported.
  • Feature transformation: This component is responsible for actually computing the features, again both batch and online processing should be supported. Computing time performance is critical when designing this component.
  • Feature serving layer: Component that actually serves the features for downstream processing. Again, features can be retrieved online or in batches.

Inside the training rig

The objective of the training rig is to find and produce the best model (in a specific point in time) given: (i) an initial model architecture, (ii) a set of tunable hyperparameters and a (iii) a historical labeled feature set.

The next diagram highlights the main components I believe there should be present to ensure a smooth and effective training operation.

Figure 4 - Components inside the training rig
Figure 4 – Components inside the training rig

The output of the training rig is what I call the "golden model", in other words, the architecture, weight and signature that will be deployed in the inference platform. In order to generate these assets, several components must intervene.

  • Re-train checker: This component mission is to detect when it is needed to re-train the current golden model, there are many situations when a re-train event must be raised. I propose to deploy a number of Evaluators to check for retrain conditions. Some examples can be changes on the features (addition or deletion), statistically divergence in the training set (data drift), or between the train data and serve data (skew), or a simply a fall in accuracy metrics. The model generated should be passed to the promoter component, that will have the last word on deploying it to production and how.
  • Golden model loop: This is probably the most critical step, it actually performs the training. Therefore performance considerations should be taken into account when engineering the system (e.g. distributed infrastructure and access to hardware asics). Another responsibility is to generate the model signature, clearly defining the input and output interfaces as well as any initialisation task (e.g. variable loader).
  • Next golden model loop: This component aims to discover potential new models by continuously optimising (or attempting to) the current gold model. There are two sub-loops, one for optimising the hyperparameters (e.g learning rate, optimisers ..) and another for launching searches for a new model architecture (e.g. number of layers). Although there are two separate loops, the new model architecture candidates can be further refined on the hyperparameter loop. This component can be resource-intensive, particularly if the search space is big and the optimisation algorithm (e.g. grid search, hyperband) is greedy. From an engineering perspective, techniques such as checkpoint for resumable operations and job prioritisation mechanisms should be taken into account. The outputs of this component should be further evaluated before taking any additional actions.
  • Model promoter: This component is responsible for issuing models ready for production work, therefore extensive testing should be performed on this step. In any case, as we will examine in the inference rig, no new model will be deployed openly to all their potential user base.
  • Metadata store: This component centralised all the metadata associated with the training phase (model repository, parameters, experiments ..)

Inside the prediction rig

The main goal of the prediction platform is to execute inferences. The next diagram presents a set of components for achieving that.

Figure 5 - Components inside the prediction rig
Figure 5 – Components inside the prediction rig

A few components are present in the inference stage:

  • Feature transformer: Even when having a feature store in place for decoupling features from data production systems, I think a feature transformer still has its place at inference time to apply low level specific operations to a potential reusable and more abstract feature. For online systems, latency requirements are critical.
  • Dispatcher: The dispatcher objective is to route requests to a particular prediction endpoint. I believe that every single request should be subjected to an experiment, that’s why the dispatcher should be able to redirect the call to a particular or many live experiment(s), to the golden model or to both. Each request not subject to experimentation is an improvement opportunity lost.
  • Predict backbone: The horsepower of the prediction rig resides on this component, hence from an engineering standpoint, it would be critical to design for classical non-functional requirements such as performance, scalability or fault tolerance.
  • Cache layer: Low latency key value store to quickly respond to re-entrant queries. It must implement the classical cache mechanisms (invalidations, key computes based on feature hashing, LRUs queues ..)
  • Golden promoter/de-promoter: As A/B tests take place, we could potentially reach to a point where one of the live experiments is actually most performant that the current gold model, this component mission is to analyse metadata and particularly ground truth data in the feature store to suggest a replacement of the golden model with one of the experiments.
  • Model warmers: Component to ensure cache and memory warm-ups when a cold star situation happen (e.g. new model promotion)
  • Explainer: Component that implement model explainability logic (e.g. Anchors, CEM ..) and returns it for a given request
  • Metadata Store: This component centralised all the metadata associated with the prediction phase (live experiments performance, prediction data stats …)

Some user journeys enabled by the platform

There are several journeys that this architecture can articulate, my goal is not to mention all of them, but I would like to highlight a few interesting ones:

At feature generation time

  • Compose a complex feature and serve it in real time
  • Compose a complex feature kicking a LRO and use it consistently by many models
  • Change/update a feature information producer without affecting the transformation and serving logic

At training time

  • Launch a re-training (distributed) job triggered by the inclusion of a new feature in the training set
  • Model evaluators based on data (features) dependencies
  • Discover a new and more performant architecture for a current DNN model
  • Optimize the learning rate for an already deployed model

At prediction time

  • Gradually roll-out a new golden model by increasingly expanding its reach to all the population
  • Query a prediction along with its black-box explanation

Manageability

  • Inspect models, features, signatures versions

Available technology for deployment

There is a great array of open source software we can use to build a platform with all the components described before, but before thinking on designing each component independently, wouldn’t it be great that we can solve non functional requirements such as scalability, security or portability in a standard and unified way?

Luckily enough we can rely on kubernetes as the main platform where we deploy our components. The following diagram shows a proposal with a mapping of components with open source products/projects*.

Figure 6 - ML Platform open source instantiation
Figure 6 – ML Platform open source instantiation

*FEAST and kubeflow integration is currently work in progress

To make things even easier, kubeflow already packages all those components in a nice way so much of the integration is already done.

Conclusion

Even one can think that running a machine learning operation is fundamentally different from a traditional one, most of the software engineering principles actually hold, they are simply applied in a different context. In this article, we have presented a logical high level architecture that can be easily deployed using open source components such as kubeflow.

I am publishing some components and example notebooks on this topic here.


Related Articles