Scaling Machine Learning models using Tensorflow Serving & Kubernetes

A developer’s guide to put ML models on production and scale

Published in

Towards Data Science

6 min readMay 28, 2020

Tensorflow serving is an amazing tool to put your models into production from handling requests to effectively using GPU for multiple models. The problem arises when the number of requests increases and makes it hard for the system to keep up with the requests. This is where Kubernetes can help in orchestrating and scaling multiple docker containers.

Outline:

Setup Docker
Get Model
Containerize model
Setup Google Cloud Clusters
Deploy models with Kubernetes

Let’s jump into it:

1. Setup Docker

What is Docker? — Docker provides the ability to package and run an application in a loosely isolated environment called a container. (Details)

To install docker you can check Here, it supports multiple platforms.

If you are using ubuntu you can install using:

# Install docker community edition
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"sudo apt-get update
sudo apt-get install -y docker-ce

To use docker in Ubuntu we usually need to prefix sudo but if you do not want to add sudo every time you can do this:

# Remove sudo access needed for docker
sudo groupadd docker
sudo gpasswd -a $USER docker

We will be needing a DockerHub account so that later we can push our docker image. Do make an account if you don’t have one.

# Once your Dockerhub account is setup, login to docker
docker login

2. Get Model

TensorFlow serving supports only SavedModel format so we will need to convert any TensorFlow models or Keras models to SavedModel format. Here is an example on how to save to saved model format https://www.tensorflow.org/guide/saved_model

For simplicity of this blog, we will download a pre-trained ResNet saved model from Tensorflow/models.

# Downloading ResNet saved models
mkdir /tmp/myresnetcurl -s https://storage.googleapis.com/download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v2_fp32_savedmodel_NHWC_jpg.tar.gz | tar --strip-components=2 -C /tmp/myresnet -xvz

3. Containerize serving model

Next, we will need to make a Tensorflow serving image. Luckily Tensorflow Serving images are already built and available in Dockerhub. It comes with both GPU and CPU version. Let’s download it.

# Downloading the CPU version
docker pull tensorflow/serving:2.1.0# To download the GPU version you can just
# docker pull tensorflow/serving:2.1.0-gpu

The default entry-point for ‘tensorflow/serving:2.1.0’ container image or any other TF serving image is ‘/usr/bin/tf_serving_entrypoint.sh’. We will be creating our own tf_serving_entrypoint.sh and I will tell you why below:

tf_serving_entrypoint.sh

The above script runs the Tensorflow Serving and loads the model from ‘/models/resnet/’, open port 8500 for gRPC, and port 8501 for REST-API.

# Download the tf_serving_script.sh
curl -s https://gist.githubusercontent.com/bendangnuksung/67c59cdfb2889e2738abdf60f8290b1d/raw/918cfa09d6efcc200bb2d617859138fd9e7c2eb4/tf_serving_entrypoint.sh --output tf_serving_entrypoint.sh# Make it executable
chmod +x tf_serving_script.sh

Creating our own serving script is recommended because you will have control over the model name, port, and model path. If you have multiple models the default ‘tf_serving_entrypoint.sh’ will throw an error. In-order to serve multiple models you will need to have a create a models.config for your multiple models and update your script. Your serving model will somewhat look like this:

# Just an example of running TF serving with models.cofig
# tensorflow_model_server --port=8500 --rest_api_port=8501 
# --model_config_file=/path/to/models.config

To understand more about TF serving with docker refer tfx/serving/docker.

Moving the RestNet saved model and tf_serving_script.sh inside the docker image and run:

# Run the tf/serving containerimage
docker run -d --name=servingbase tensorflow/serving:2.1.0# copy the saved model
docker cp /tmp/resnet/ servingbase:/models/# copy tf_serving_script.sh
docker cp tf_serving_entrypoint.sh servingbase:/usr/bin/tf_serving_entrypoint.sh# commit 
docker commit servingbase myresnet:latest# kill the container
docker kill servingbase# running new created image
docker run -d --name=myresnet -p 8500:8500 -p 8501:8501 myresnet:latest# list running container and see whether its running
docker ps

Let's test whether the docker responds to our request. Will download a client script that will inference using gRPC and RESTAPI.

# Download the Restnet client script
curl https://gist.githubusercontent.com/bendangnuksung/8e94434a8c85308c2933e419ec29755a/raw/0a52618cdce47d16f2e71c900f2a1ee92063933f/resnet_client_restapi_grpc.py --output resnet_client_restapi_grpc.py

Test the client script:

# Test using GRPC
python resnet_client_restapi_grpc.py -p 8500 -ip localhost# Test using RESTAPI
python resnet_client_restapi_grpc.py -p 8501 -ip localhost# We will see that GRPC has faster response time than RESTAPI
# Once its deployed in cloud the difference is much higher# Stop running container
docker stop myresnet

Push the image to Dockerhub:

# Push image to Dockerhub | replace DOCKERHUB_USERNAME with your A/c name
docker tag myresnet:latest DOCKERHUB_USERNAME/myresnet:latest
docker push DOCKERHUB_USERNAME/myresnet:latest

4. Setup Google Cloud Clusters

We will be using Google CLoud Platform(GCP) as it provides Kubernetes engine and gives $300 as a free trial for one year which is an amazing resource for us to get started with. You can make your free trial account here. You will need to enable your billing to activate your free $300 trial.

GCP allows you to handle the resources through CLI using Gcloud SDK. Install Gcloud SDK for Ubuntu, Mac.

Kubectl is also needed to control Kubernetes clusters. Install Kubectl from here.

Setup GCloud project and instantiate clusters:

# Proceed once Gcloud and Kubectl is installed# Gcloud login 
gcloud auth login# Create unique Project name | Replace USERNAME with any unique name
gcloud projects create USERNAME-servingtest-project# Set project
gcloud config set project USERNAME-servingtest-project

Activate Kubernetes Engine API. https://console.cloud.google.com/apis/api/container.googleapis.com/overview?project=USERNAME-servingtest-project (Replace USERNAME in the link with the unique name you provided early)

Create and connect Cluster:

# Creating a cluster with 2 nodes
gcloud beta container --project "USERNAME-servingtest-project" clusters create "cluster-1" --zone "us-central1-c" --disk-size "30" --num-nodes "2"# You can change the zone and disk size. More Details at  https://cloud.google.com/sdk/gcloud/reference/container/clusters/create# Connect to cluster
gcloud container clusters get-credentials cluster-1 --zone us-central1-c --project USERNAME-servingtest-project# Check whether the 2 Nodes are ready:
kubectl get nodes# Sample output: 
# NAME                STATUS   ROLES    AGE   VERSION
# gke-cluster-1-...   Ready    <none>   74s   v1.14.10-gke.36
# gke-cluster-1-...   Ready    <none>   75s   v1.14.10-gke.36

5. Deploy Models with Kubernetes

What is Kubernetes? — Its a container orchestrator. You can think Kubernetes as a good Tetris player, so whenever a new container of different shapes and sizes comes in Kubernetes finds the best slot to place the container.

Do check this video to have a better understanding.

Deploying the container models with Kubernetes in two stages:

Deployment: Deployment is responsible for keeping a set of Pods running. We define a list of the desired states for the pods and the deployment controller changes the actual state to the desired state.

2. Service: An abstract way to expose an application running on a set of Pods as a network service. We define a list of states such where the port should listen to or which app it should listen to.

service.yaml

Let's download both files:

# Download deployment & service yaml files
curl https://gist.githubusercontent.com/bendangnuksung/f1482aa9da7100bc3050616aaf503a2c/raw/7dc54db4ee1311c2ec54f0f4bd7e8a343d7fe053/deployment.yaml--output deployment.yamlcurl https://gist.githubusercontent.com/bendangnuksung/5f3edd5f16ea5bc4c2bc58a783b562c0/raw/f36856c612ceb1ac0958a88a67ec02da3d437ffe/service.yaml --output service.yaml

We will need to make a change in ‘deployment.yaml’

line 16: Change bendang to Your Dockerhub account name
from: image: bendang/myresnet:latest
to  : image: DOCKERHUB_USERNAME/myresnet:latest

Start Deployment:

kubectl get deployment
#Output:No resources found in default namespace.kubectl apply -f deployment.yaml
#Output: deployment.extensions/myresnet-deployment created# Wait until the depoyment is ready: 2/2kubectl get deployment
#Output: 
# NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
# myresnet-deployment   2/2     2            2           1m

This will load the ‘myresnet:latest’ image to two pods as defined in the ‘deployment.yaml’ file.

Start Service:

kubectl get service
# output:
#NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
#kubernetes   ClusterIP   10.7.240.1   <none>        443/TCP   28hkubectl apply -f service.yaml
#output: service/myresnet-service created# wait until it allocate external IP for LoadBalancerkubectl get service
#output
#NAME         TYPE          CLUSTER-IP   EXTERNAL-IP     PORT(S)                         
#kubernetes   ClusterIP     10.7.240.1   <none>          443/TCP                       
#myresnet-s.  LoadBalancer  10.7.252.203 35.192.46.666  8501 & 8500

After running the ‘service.yaml’ we will get an External IP, in this case, it's 35.192.46.666. This IP will now be our single access point where we can call our models and all load balancing is handled internally.

Testing:

We will still use the same script ‘resnet_client_restapi_grpc.py’ only change is providing the ‘External IP’ of the service we created.

# Test using GRPC 
python resnet_client_restapi_grpc.py -p 8500 -ip 35.192.46.666# Test using RESTAPI
python resnet_client_restapi_grpc.py -p 8501 -ip 35.192.46.666

If you have any questions please let me know in the comments below.

Stay tuned for my next post on Deploying Machine Learning models on Google Cloud Functions.