5 Different Ways to Deploy your Machine Learning Model with AWS

On the pros and cons of different approaches to getting your model out into the world.

Kyle Gallatin
Towards Data Science

--

As a data science mentor, I get a lot of questions about the infrastructure side of machine learning. Many newcomers build models they want to expose to the internet via an API, but even with the wealth of available resources they still have trouble doing so. One of the primary reasons for this is that it’s often unclear what the “best” way to accomplish that goal is.

Photo by Letizia Bordoni on Unsplash

As with all things in software engineering, there are a ton of different ways to accomplish the same thing. However, different approaches have different pros and cons. The more managed the service, the more it provides out of the box — but sometimes at the cost of higher prices or less flexibility. Less managed services are sometimes cheaper or provide greater flexibility, but could require larger amounts of time and expertise to configure.

In this post, I’ll highlight 5 common methods you can use to deploy a simple, real-time machine learning API (such as a Flask app) to the internet. While this list isn’t exhaustive, it should point you in the right direction towards picking the solution that’s right for you.

Deploy it on an EC2 Instance

In my opinion, one of the simplest (but least robust) ways to deploy your model to the internet is to run it on an EC2 instance. This is as simple as acquiring a virtual machine in the cloud, making it accessible to the internet, and starting your application on it. For a walkthrough on this solution, see one of my previous posts here.

I recommend this solution to users who need to do quick demos or just showcase something temporarily. If your app is small enough, it’ll cost you literally nothing to host and you can have it up in as little as 5 minutes. You don’t even need familiarity with tools like Docker, just the linux command line. For such use cases this would be the cheapest and easiest option.

However, this approach requires lots of hacky manual setup not suitable for real production deployments. If you’re an individual who wants to show your app off to some friends, this solution is great. If you’re looking for something sustainable long-term — keep reading.

Pros

  • Quick and dirty
  • Cheap (potentially free)
  • Easy setup/teardown
  • Little to no infrastructure/networking experience required

Cons

  • Not very scalable
  • Not production grade
  • Little to no automation
  • Not robust to errors

Create an AWS Lambda Function

AWS Lambda is a service for deploying serverless functions. “Serverless” doesn’t mean there is no server, it just means that you don’t care about the underlying infrastructure for your code and you only pay for what you use. This is often preferable to provisioning and managing your own machines, which is what you would’ve needed to do in the previous step.

While Lambda may not satisfy some more complex use cases, it is ideal for simple and repeatable code. It’s scalable, extremely cheap and pretty easy. You’ll need to work with some other services such as API Gateway, but the setup will be far more robust than deploying your app to a standalone EC2 machine. For production, this would probably be the cheapest option.

However, they are less flexible than the pure containerized solutions you’d see using ECS or EKS. Features such as supported programming languages are limited. I recommend this solution to users with simple ML code who don’t want to think about infrastructure at all.

Pros:

  • Production grade
  • Great for small simple apps/functions
  • Serverless (extremely cheap)

Cons:

  • Less flexible than other solutions
  • Requires knowledge of additional AWS services

Containerize it and Deploy it on Kubernetes (EKS)

Kubernetes is one of the go-to options for managing and scaling containerized applications nowadays. The declarative nature of Kubernetes helps automate many production-level concerns such as load balancing or autoscaling. Unlike more managed container orchestration solutions like ECS, Kubernetes provides granular control of your machine learning app.

Simple, stateless machine learning applications are often a good fit for Kubernetes. In addition, there are lots of mature open source solutions for ML on Kubernetes (such as Seldon) which provide domain-specific support and further abstract the complexities of infrastructure.

However, it comes at a bit of a cost. Unlike requisitioning a single instance on which to deploy your app, you now have to manage an entire Kubernetes cluster. Deploying an application and managing a cluster can prove no simple task for new users.

Kubernetes networking is complex, and requires lots of experience to understand and operate in depth. While a Kubernetes cluster may also seem cheaper than a more managed ML solution, a poorly managed cluster can lead to even worse unexpected monetary costs. I recommend this approach to users who absolutely need a production grade solution, want fine-grained application controls, or just want to get experience with Kubernetes.

Pros:

  • Very scalable
  • Good amount of automation
  • Production grade
  • Lots of community support
  • Highly flexible
  • Experience with a popular framework and lower-level CS!

Cons:

  • Potentially lots of work
  • Risky for beginners
  • In some cases, just straight up unnecessary
  • Lots of setup require for feature parity with managed services

Containerize it and Deploy it with Elastic Container Service (ECS)

Like Kuberenetes, ECS is a container orchestration service for deploying applications. The difference is the distribution of responsibilities. Instead of the user being responsible for some lower-level infrastructure concerns you’d have to take care of in Kubernetes, you have AWS do it for you. ECS is similar to Lambda in that it abstracts away infrastructure concerns. In terms of flexibility, it sits in between Lambda and the highly flexible Kubernetes.

I recommend this solution to individuals without Kubernetes experience who want to deploy containerized applications. If you’re working alone, it can be tough to manage your app on Kubernetes along with all of the concerns of managing a cluster. It’s a much better use of your time to relinquish those responsibilities so you can focus more on the application itself. If you aren’t using Docker, I’d defer to Lambda.

Pros:

  • Significantly easier setup than Kubernetes
  • More features out of the box
  • Easier to manage as an individual developer (with container experience)
  • First-class support for containerized applications

Cons:

  • Less granular controls
  • Potentially more expensive

Create a Sagemaker Endpoint

AWS Sagemaker is a first-class suite of ML tools for the cloud. From hosted to jupyter notebooks to easy model endpoints, the experience with Sagemaker will probably feel like creating deployments locally on your machine. The machine learning specific support comes with a whole suite of services that empower users to build and deploy production ready ML apps with all the bells and whistles you’d have to manually configure for other options.

Of course, the highly specific and managed nature of this solution could make the cost on paper more expensive. It also gives the user less exposure to lower-level infrastructure concerns. However, what you might pay for in $ dollars you’ll probably get back in time as Sagemaker not only makes it easy to deploy models, but creates production grade ML pipelines.

I recommend this solution to new cloud users who only want to learn 1 new technology to deploy their models, and want to have all other ML infrastructure considerations satisfied from the get go. Unless you have experience in some of the other aforementioned technologies, investigating in Sagemaker will likely enable you to develop quickly with little to no cloud experience.

Pros:

  • First-class machine learning support
  • Managed infrastructure and environments
  • Production grade, scalable

Cons:

  • Potentially more expensive than some other solutions
  • Potentially less flexible

And more…

I’m sure the full list is pretty exhaustive, and gets more complex when you consider other use cases such as batch prediction. However, these approaches and their analogues on other cloud platforms are fairly common approaches. Any one of them could be the most “correct” for you depending on your use case, application and other existing infrastructure.

In addition, while I work a lot in the cloud I admit I’m not an expert of each and every one of these services. If I’ve missed anything or made assumptions you think should be rectified, please comment below and I’ll address it! These tradeoffs are highly subjective, and machine learning deployments vary in both complexity and implementation from use case to use case.

Follow me on LinkedIn and Twitter if you’re into being disappointed.

--

--

Software Engineer for ML Infra. Building scalable, operationalized machine learning services. I don’t represent my employer.