Deploy Any ML Model to Any Cloud Platform

Introducing Truss, an open-source library for model packaging and deployment

Tuhin Srivastava

Published in

Towards Data Science

6 min readJul 29, 2022

Truss is an open-source Python library for ML model serving | Photo by Joshua J. Cotten on Unsplash

Model serving isn’t just a hard problem, it’s a hard problem that constantly demands new solutions.

Model serving, as part of MLOps, is the DevOps challenge of keeping a complicated, fragile artifact (the model) working in multiple dynamic environments. As frameworks are built and updated for training models, and production environments evolve for new capabilities and constraints, data scientists have to reimplement model serving scripts and rebuild model deployment processes.

Data scientists working in large, well-resourced organizations can hand off their models to specialized MLOps teams for serving and deployment. But for those of us working at start-ups and newer companies, like I did for the first decade of my career, we had to handle the ML deployment challenge ourselves. The problem: serving and deploying a model requires an entirely different set of skills and technologies than training it did.

A quick introduction: I’m Tuhin Srivastava, CEO at Baseten, where Truss was first developed. While working to figure out what data scientists need to make MLOps happen, we talked to data science leaders, and heard things like:

“We wanted to avoid any kind of custom DevOps around hosting models ourselves. If doing it ourselves, we would likely need to spin up our own Docker on VMs or Kubernetes clusters, and then we’d have to take care of all the DevOps around that stuff too.” — Faaez Ul Haq, Head of Data Science @ Pipe
“Our team is made up of mostly Data Scientists and Linguists — we’re not DevOps experts. We can write Python, but we don’t want to be in the business of writing YAML config all day.” — Daniel Whitenack, Data Scientist @ SIL

A data scientist’s working environment is the Jupyter notebook, a flexible and permissive system designed for iterative experimentation. The Jupyter notebook is a great tool for training models, but as an impermanent and development-oriented environment, it isn’t great for model serving. Model serving requires technologies like Docker to bring a stable, predictable environment.

How data scientists handle serving models today

Serving a model in production generally comes down to a few steps:

Serialize the model
Put the model behind a web server such as Flask
Package the web server into a Docker image
Run the Docker image on a container

Within these steps lurk additional complications. The model needs to accept input and produce output in appropriate formats, transforming from a Python-first interface to a web-first interface. And some models need to access GPU hardware to make predictions, or securely access secret values, or import Python and system packages.

Navigating the deployment maze | Photo by Robert Linder on Unsplash

But the bigger issue is that even the basic steps vary for every framework, and sometimes for different models built with the same framework. So even if you know how to serve a TensorFlow model, you’ll have to re-learn how to serve a PyTorch model, and go through the process again when you try a model from Hugging Face.

“Well, that’s ok,” you might say, “I’m just going to use one modeling framework. I’ll be a TensorFlow ML engineer.” The thing is, we don’t have different frameworks because data scientists are bad at agreeing. It’s because different problems require different approaches. Each popular modeling framework excels at different kinds of underlying algorithms and structures. But model serving technologies don’t need to be disparate.

Open source packages like Cog, BentoML, and MLflow help to simplify the model deployment process. We wanted to expand upon these ideas and develop an open source library specifically with the data scientist at a start-up in mind. Our two key beliefs:

Build for Python users: As data scientists, Python is our comfort zone. We wanted a model deployment library that could be entirely managed in Python.
Work with every model and platform: We wanted an open source package that could handle model deployment regardless of model framework and cloud platform.

Guided by these ideas, we built and open-sourced Truss.

How model serving works with Truss

Steps to serve and deploy a model | Image by Author

Step 1: Standardized model packaging

We’re in a Jupyter notebook on our local machine, the data scientist’s home field. Using Hugging Face transformers, we’ll bring in the t5-small model as a pipeline for this example.

from transformers import pipeline
import truss
 
pipe = pipeline(tokenizer="t5-small", model="t5-small")
scaf = truss.mk_truss(pipe, target_directory="my_model")

Hugging Face is one of the many popular frameworks — also including LightGBM, PyTorch, scikit-learn, TensorFlow, and XGBoost (with more coming soon) — that Truss supports out of the box. As such, all we need to do is run mk_truss on the model and everything will be serialized and packaged, ready for use.

Step 2: Solid local development

With our Truss, we can invoke the model in our Jupyter environment:

print(scaf.server_predict({"inputs" : ["translate: hello world in german"]}))
# Expected result is {'predictions': [{'translation_text': 'Übersetzen: Hallo Welt in deutsch'}]}

But Truss goes beyond in-code model invocation. There are a variety of local development options including running the model in a Docker container and making API requests.

To start the docker container:

truss run-image my_model

To make a request:

curl -X POST http://127.0.0.1:8080/v1/models/model:predict -d "{'inputs': [{'translation_text': 'Übersetzen: Hallo Welt in deutsch'}]}"

Local development is about more than just testing. You can update your model to better integrate with other systems with pre- and post-processing functions, create sample inputs to document test cases, and configure every aspect of your Truss to match your needs. With the various invocation options presented above, you’ll be able to quickly get your model ready for production with a tight dev loop.

Step 3: Seamless production deployment

Thanks to Docker, the development environment we have been working in closely matches the eventual production environment. Depending on where you want to deploy your model, follow specific deployment instructions for platforms like AWS ECS, Baseten, and Google Cloud Run. Your model can also be deployed anywhere that you can run a Docker image.

Invoking the deployed model may vary slightly depending on your environment, but should be an API request matching to the one in step 2, but targeting the production domain.

Step 4: Sharing and iteration

Your serialized model and associated code and config files comprise the entirety of your Truss. As such, packaging your model as a Truss makes it portable, unlocking two key use cases: version control and sharing.

Everything is in files, so you can commit your model in Git. This way, you can implement model versioning, test different iterations of your model, and push your model to a repository on GitHub or similar.

The most frustrating part of trying to run someone else’s model is replicating their environment. But with your model now saved as a Truss, all anyone needs to do to run it is download it from GitHub or another repository source, install Docker and the Truss Python package, and serve the model locally. We’re excited to enable more collaboration and iteration in open-source modeling.

Machine learning models are only going to get more sophisticated and capable. In turn, serving these models reliably — locally and in production — becomes more essential than ever. We have committed to long-term support and development for Truss and let us know what we should add to our roadmap.

Truss offers a unified approach to model serving across model frameworks and deployment targets. Get started by starring the repo and working through the end-to-end deployment tutorial for your favorite framework and platform.