Figuring out what people mean when they say "Mlops" is hard. Figuring out how to properly do MLOps, even for a technical person, is perhaps even more difficult. How difficult must doing MLOps be, then, for a citizen data scientist that knows nothing of web technologies, Kubernetes, monitoring, cloud infrastructure, etc.? Here I continue exploring how to set up an open-source MLOps framework for this purpose: specifically, I outline and show how a combination of Databricks, mlflow, and BentoML can potentially provide a compelling, extensible, and easy-to-use MLOps workflow for end-users.
I have previously discussed that an MLOps framework must come with batteries included and support an extensive list of features; model lifecycle management and governance, monitoring, A/B testing, interpretability, drift/outlier detection, and so on. However, as the end-user:
I just want to define a given python interface, push a big green button, and get back a REST endpoint URL where I can interact with my deployed model.
In the first part of this series of blog posts, I explored how Databricks in combination with Seldon-core checks off most of what I see as requirements for deploying and running MLOps; however, the open-source offering of Seldon-core is quite cumbersome for the end-user and does not align with the vision for simplicity; write a python class, push button, get REST endpoint. This post extends upon my previous post, so I recommend checking that out:
What tools are we looking at
Databricks cover everything from experimentation, tracking, versioning, registry, and governance for lifecycle management of ML models while being pay-as-you-go with no upfront costs and minimal barrier for the average data scientist to get started. In this post, we will focus on the next step where BentoML will help us deploy anything registered into the mlflow model registry as REST endpoints – we will have to do a bit of hacking to get these tools to play nicely together 😏 . Again, check part one for the details on MLOps in Databricks.
Step 1: What the end-user sees
Picking up from the first blog post, let us say we have trained:
- a standard Keras model for MNIST classification
- an algorithm for detecting feature drift in the input data
- an algorithm for detecting outliers in the input data
- an integrated gradients algorithm for explaining our predictions.
Most data scientists will be familiar with at least one of the above points. Let us save all these models to our local disk in an artifacts/mnist-keras
folder:
We want to put all these models into a simple python class, which we want to register into the mlflow model registry. Essentially we want this class to do everything; return predictions, return explanations, define metrics to monitor, etc. Therefore, we tell the end-user to put the model into an interface like the following:
There are a few things to note here:
- The class inherits from
mlflow.pyfunc.PythonModel
and can therefore be logged into the mlflow model registry, where we can control governance, have multiple versions, transition the versions into different stages, etc. - The
predict
,explain
, andreward
methods are decorated with BentoML endpoint definitions to indicate that these are API endpoints we wish to expose. Additional endpoints could, of course, be added. Note that mlflow has no clue what these BentoML decorators do. - The method
prepare_input
defines how to handle input data; this is needed since BentoML supports micro batching; that is, if a million individual requests come in simultaneously, it creates batches to pass through the model at the same time for efficiency. - Every time
predict
is called, the method automatically calculates and logs drift and outlier results for those samples into Prometheus metrics – of course, it would be ideal if we could instead calculate and log these asynchronously, but in a large majority of use cases, these calculations are not horribly expensive, and we win a lot of simplicity by keeping them in thepredict
method.
Now that we have our model defined, we want to log it into the mlflow model registry alongside the local directory where we saved all our models (artifacts/mnist-keras
) and a dictionary describing our python environment & packages. The mlflow UI can help do some of this, but we can also do it entirely in code:
That’s it – now we have put everything related to our MNIST model (predictor, explainer, drift, and outlier detector) into one mlflow model, and we have registered this model into our registry and transitioned it to the "Production" stage.
But nothing is deployed yet. So, in the next section, we’ll see if we can prototype some code for automatically deploying everything in the "Production" stage in mlflow using BentoML.
Step 2: Behind the Scenes
Behind the scenes, we want a service to be running that constantly checks whether new models have gone into the "Production" stage in the model registry and ensures that all these models are actually deployed. To perform the deployment, we will be using BentoML, which allows us to wrap our mlflow model objects and containerize them into docker images. Thus, all things in this section should be completely generic and happen on the backend without any user interaction.
Step 2.1: Wrapping mlflow with BentoML
To get started, we first have to define an "artifact" for BentoML, the artifact, in this case, being our mlflow model. Assuming we have downloaded a model from the mlflow model registry, we can create an artifact that lets BentoML copy contents of a local directory into a BentoML container (save
), and instructs BentoML on how to load the model (load
):
Now that we have an artifact, next, we need to define a BentoML "service," which takes all the endpoints we exposed on our mlflow model and exposes them on BentoML as well. To do this, we hack into the _config_artifacts
method, which gets called on every instantiation of the BentoML service and then dynamically adds any API methods from the mlflow model on the BentoML service.
Notes for this class:
- It has a decorator detailing how it relies on an
MlflowArtifact
- It also relies on a
environment.yml
to describe the conda environment - It calls
_config_inference_apis
to add mlflow methods to the API
With this, we can now create a simple function, which takes any mlflow model in the model registry, downloads it, and packages it into a BentoML service, i.e., something along the lines of:
This will save a MlflowBentoService:latest
BentoML service locally. We can then run the following commands to put the BentoML service into a docker image:
We now have a bentoml-mlflow-mnist-keras
docker image, which contains our model and is ready for deployment. We can test that it works all right by running the docker image locally:
Going to localhost:5000
we should now see a swagger API with all our endpoints exposed, ready to accept requests. All fine so far.

Step 2.2: Automatic Deployments
The next step is to automatically deploy any mlflow models registered in the registry to some cloud infrastructure. BentoML allows you to deploy on many different infrastructures, but we will assume we already have set up a Kubernetes cluster; see example in the first blog post.
Essentially we need a service that constantly keeps the mlflow model registry in sync with what is deployed to Kubernetes. We can envision this either as:
- A service that every X minutes runs an API request to list models in the mlflow registry and deploy them.
- Using CI/CD webhooks in the Databricks model registry (currently private preview) to deploy models as soon as they are registered.
Once a new model is found in the registry, the following needs to happen:
- Download the mlflow model and pack it to a BentoML docker image, similar to what was shown in the previous section.
- Push the docker image to a repository (e.g., DockerHub, ACR, etc.)
- Create a Kubernetes deployment
yaml
and apply
The details of such an implementation depend on the specific infrastructure we would be deploying to. Still, we can quickly write up an overly simplified example of a python function that fetches a template for a Kubernetes deployment, fills out the model details, and then applies the deployment.
A proper implementation would also be adding a specific ingress route so that the model would serve the Swagger UI API at a given custom domain route, e.g.www.myexample.com/model-name/
Results
The BentoML documentation demonstrates how we can install Prometheus to our cluster and automatically scrape the metrics/
endpoints for all our deployed models. Based on this, if we also have Grafana installed on the cluster, we can have our automatic deployments create dashboards like the following:

Final Thoughts
In this post, we hacked together a simple python class using mlflow and BentoML. This class allows full flexibility for implementing custom models with custom metrics for drift and outliers to be monitored and custom endpoints for explaining or rewarding the model predictions. The beauty of this class is that regardless of the model specifics, we can register it into the mlflow model registry and then create a service that automatically deploys this to production.
The outlined solution is easily extended to include additional features. E.g., we could create extra "config" options on the mlflow model that would determine which base docker image to use for deployment, which infrastructure to deploy on (in case of multiple clusters), sending re-training requests to Databricks (e.g., when drift or reward decreases below a threshold), or even how to deploy A/B testing.
One important component that would have to be implemented into this setup is the synchronization of permissions from mlflow to the Kubernetes cluster. I.e., only the users who have read permissions for the model in the mlflow registry, as can be controlled through the UI, should have permissions to call the final endpoint.
A framework like this is compelling. It allows data scientists to focus on model development and then rapidly push models into production with monitoring, governance, etc., without consulting data/cloud engineers in the process.