Simplified MLOps for Kubeflow with Kfops

The article introduces Kfops, a tool on top of Kubeflow, that can be plugged into your MLOps lifecycle.

Bart Grasza
Towards Data Science

--

Image by Author

What is Kfops?

The project’s primary goal is to simplify and standardize Kubeflow pipeline compilation, pipeline execution and model deployment.
It has two “components” — a dedicated python package and chatops commands. Python package is used during the development/experimentation phase, and chatops commands directly in the Pull Request comments.

How does it fit in the MLOps lifecycle?
Let’s use an example. The code of your ML model is stored in the Github repository. Your team uses Github Issues to discuss features, improvements and bug fixes of the ML model and (Kubeflow) pipeline.
In one of the Issues, the team plans to improve the pipeline’s preprocessing step. Your task is to test how the changes will influence the model.
As a first step, you want to experiment in your development environment. You make the changes in the code and use Kfops CLI (command kfc build-run) to execute the pipeline directly in the Kubeflow. Kfops outputs the link to your pipeline run.
You make more changes until the results are ready to be shared with the team.
You push the changes to the repository and create the Pull Request. In the Pull Request, you run the pipeline again with the /build_run chatops command. Kfops runs the pipeline and prints results directly in PR. Now you can share the results with the team by referencing your PR in the Issue.
The team is happy with results and decides it should be deployed to production. You go back to PR and issue /deploy command.

What did you accomplish?

  • You created reproducible experiment where only the preprocessing step has changed and everything else was kept intact.
  • Because of that, you can easily measure the improvement caused by your modifications.
  • The experiment has been effortlessly “documented” in the Issue and PR and linked with particular Kubeflow pipeline run and it’s results.
  • Your improvements are now on production!

Here is the visual representation of the the process (see full screen image here):

Image by Author

It might take you some time to digest that figure. Here are the key takeaways:

  • Both kfc and “chatops” commands aim at hiding the underlying complexity of Kubeflow.
  • As a Data Scientist/Engineer, you’d expect a simple, flexible, and scalable experimentation/development phase. The already mentioned kfc command controls it. Depending on your setup, the command can be run directly in-cluster (e.g., Jupyter notebook preinstalled with Kubeflow or any other Kubernetes Pod configured to access Kubeflow Pipelines) or from a local environment (outside of your Kubernetes cluster) connected to remote Kubeflow instance.
  • Model deployment is not allowed with kfc command because it doesn’t record when and who executed it.
  • Model deployment is purposefully removed from the Kubeflow Pipeline. Instead, you should use /deploy to publish only those models which fulfill your “production” requirements.
  • Commands like /build_run, (or /build and /run), /deploy (or /staging_deploy) are executed in the context of Pull Request.
  • The team uses repository’s Issues and Pull Requests to discuss planned experiments, improvements, fixes etc. Commands executed in Pull Requests link this discussion with results of the experiment in the Kubeflow Pipeline UI.
  • Pipeline run (/build_run or /run) prints out Kubeflow Pipelines input parameter values directly in the PR comment.
  • At the moment, Kfops can deploy the ML model only to the same Kubernetes cluster where your Kubeflow with Kfops were installed.
  • Not all changes made in the repository are related to the ML model. Model training can be time-consuming or costly. Therefore, when PR is created or updated, Kubeflow Pipeline isn’t started automatically. It has to be manually executed with the “run” command.

For brevity reasons, the figure hides details like container images builder, etc. Check out more figures and details in the documentation.

Main configuration file

Another important Kfops feature is the main configuration file. It centralizes the project’s pipeline, container image builder, and deployment settings.
You can override the pipeline-related settings during development with kfc command. Parameters can be overridden with kfc’s --set flag and/or separate override file (e.g.kfc build_run --config-override override.yaml).
Notice that because the config file is part of the repository, you can easily track all changes made to it in the past.

Kfops example configuration file (Image by Author)

Refer to config file documentation for more details.

Conventions, restrictions, and more features

You can find the complete list in the documentation; here are the most important ones:

  • Conventions (e.g., centralized configuration) enforce standardization across all of your ML projects.
  • Only a single ML model per repository is allowed.
  • Kfops can be plugged into any SCM system (though currently, only Github is supported).
  • During /deploy, the “canary” stage is performed by default and followed by optional inference endpoint check (HTTP return code 200).
    If the ML model is successfully deployed, Kfops will label the PR (e.g., Deployed-to-production), merge with HEAD and close the PR.
    All failures are logged as PR comments.
  • If your ML model repository is open to public contributions, you can restrict who can execute chatops commands (feature in progress).
  • Outputs of various warnings and errors are “logged” directly in the PR context:
Example pipeline error (Image by Author)
Example pipeline warning (Image by Author)

What’s next?

Despite being fully functional, the project is still in the early stage of development. It’s long term goal is to provide options for companies at various stages of MLOps maturity.
Head out to the documentation for more details.

--

--

Data science engineer passionate about data wrangling, MLOps, geospatial and NLP