
This article focuses on deploying machine learning models using mnist handwritten digit recognition as a base example implemented in tensorflow-2. In the end, we will be cooking up a small web app in React to test our model. If you are a machine learning enthusiast then you already know that mnist digit recognition is the hello world program of deep learning and by far you have already seen way too many articles about digit-recognition on medium and probably implemented that already which is exactly why I won’t be focusing too much on the problem itself and instead show you how you can deploy your models and consume them in production. To see the end result you can view the deployed-app here
Before jumping into deployments I will quickly give you a brief walkthrough of the model and show you how you can save your model and consume it in production later on. If you are tired of reading about handwritten-digit-recognition like me you can skip this portion and use the github-repo to get the model so you can follow the rest of the guide.
Libraries used.
- Tensorflow 2
- Matplotlip
- Numpy
For pre-processing the data I am just normalizing the pixel values in 0–1 range by dividing with 255.
Once we have the data loaded and pre-processed I am using a convolutional-net with 2 conv-layers each followed by a max-pooling layer which is then flattened before it’s passed on to a dense layer with 128 units and finally our output layer has 10-units and its using softmax as the activation function which turns the end result to probabilities and distributes it over our classes.
For training the model I am using tensorboard, early-stopping, and model-checkpoint callbacks for gathering information during the training process. Tensorboard is a very handy tool in the tensorflow ecosystem, it lets you visualize your training process, model-graphs, and provides useful metrics that help you a lot in representing and sharing your findings and quickly experiment. Early-Stopping is also a very useful call-back that lets you monitor a very specific metric and stops training if that metric is not improving over time. This saves us from over-fitting our model.
Now that we have trained the model we will save our model in protobuf format which is the default-format in tensorflow-2. Note that we are saving the model in a sub-folder named 1. The reason is that tensorflow-serving uses this convention to load the model version you specify. By default, the server will serve the model with the largest version number.
The last part would be to convert our model to work with tensorflow-js in the browser and for that purpose, we will use tensorflow-converter which helps us convert our pre-trained tensorflow models in python to work in the browser using tensorflow-js. Install and run the tool using the below command and provide the path to your saved model directory when the wizard asks.
pip install tensorflowjs[wizard]
tensorflowjs_wizard
If you have followed all the steps till here and have been successfully able to save a model congratulations you are halfway through the journey. Now you have a model that you can deploy in the cloud and use from various applications.
Before we begin with the deployments part lets quickly go through an alternative approach which is also widely used and implemented via a Flask app or an express-server or any other api-framework to serve the model which is not a very efficient way of serving your models in production, the reason being that those frameworks are primarily designed for HTTP requests/response and do not account for your machine’s hardware capability when making inferences. Similarly, it could be hard to standardize the way you load and serve your model via a rest end-point and when you have multiple models/projects being worked on. One of the biggest advantage of using tensorflow-serving over Flask is that it is primarily built for serving flexible and scalable ML models in production and has been battle-tested. Plus it has support for model versioning and serving many models with many versions and it scales really well. However, Flask or any other API-framework that you use can come in very handy if they are used as a middle layer between the client and tensorflow-serving, handling the routing for different tf-servers and preprocessing the data before sending it over to the model for prediction. You can visit this article for a much better understanding of this.
Now we can get on with the deployment part, I will be using Google Cloud to deploy the models and go step by step into the various cloud services that are available to you. Google cloud and tensorflow integrate quite well, we will be looking in particular at these three options that are available to us.
In the first part of this series, we will be using TensorFlow-serving, followed by AI-Platform and cloud functions in the next two parts.
Tensorflow Serving
Tensorflow serving is a part of the tensorflow extended ecosystem, one of the key benefits of tensorflow serving is that it’s highly scalable and has low latency. Tensorflow serving has the capability to serve multiple models and versions which is a potential use-case when you are going in production you might need to update your models and serve multiple versions.
We will be using tensorflow-serving with docker which is very easy and quick to get started with. Before following the next steps head over to the docker-website and install the docker-desktop app for your OS.
// from terminal pull the tensorflow-serving docker image docker pull tensorflow/serving
The tensorflow serving image has port 8500 exposed for gRPC whereas port 8501 is exposed for REST-API. Now to serve with docker we need
- A saved-model that we want to serve. (We have already saved our model)
- We need to open up a port on our host on which we will be serving the model.
- A name for our model which the client applications will use to refer to the model.
Next in the terminal run the below com
docker run -p 8501:8501 --mount type=bind,source=<path-to-your-model>,target=/models/mnist-digit-model -e MODEL_NAME=mnist-digit-recog -t tensorflow/serving
Please note the path would be to the root folder of your model and not the sub-folder which specifies the model-version. With the above command, we have basically started a docker container and binded the rest-api port with the host port 8501. Next, we are bounding our saved model path to the default model path i.e models/mnist-digit-recog. After which we have specified the environment variable name and set it to mnist-digit-recog and that is it, if you have followed the steps till here properly you should have a docker image running with tensorflow-serving and your model being served on port 8501.
Note if you want to expose the gRPC port too you can run the below command.
docker run -p 8501:8501 -p 8500:8500 --mount type=bind,source=<path-to-your-model>,target=/models/mnist-digit-model -e MODEL_NAME=mnist-digit-recog -t tensorflow/serving
If everything ran successfully you should have similar output on your terminal as below.

If you want to explore tensorflow-serving with docker I suggest you look into the official docs.
Now that we have tensorflow serving our model, let’s quickly test this by making a post request, and verify that everything is working correctly.
The REST url structure is given below:
HOST = localhost
PORT = 8501
MODEL_NAME = mnist-digit-recog
MODEL_VERSION = 1
// Default
http://{HOST}:{PORT}/v1/models/{MODEL_NAME}:predict
// Specific Model-Version
http://{HOST}:{PORT}/v1/models/{MODEL_NAME}/versions/{MODEL_VERSION}:predict
We can use the below piece of code to quickly test that the model is being served properly. Load up the notebook from the attached github-repo above and in a new cell paste the code below and run it to see the results.
Similarly, you can now call this end-point from any client-side application and voila you should have the predicted digit. For instance, if you are a JS fan you can use the below code snippet.
Awesome now if we want to create our own docker-image that has our model built-in we can do that by first running the serving image as a daemon and then copy our saved model to the containers model folder. For that quickly follow the steps from the official tensorflow-serving guide here. You can name your container whatever you want, I have called mine mnist-digit-container and now we don’t have to bind our model path or do any extra config. We can just run the below command to get our model served.
docker run -p 8501:8501 <your-container-name>
And that’s it we have a container with our model built-in. Before we move on to the next step please give this link a visit which discusses how you can specify the model configuration and versioning which could be very handy.
Deployment on GCP
Now we will deploy the docker image on Google Cloud using Kubernetes. Before we move on with the setup let’s first have a quick glance at Kubernetes. In short, Kubernetes helps automate the process of deploying, scaling, and managing containerized applications. In a production environment, it is needed to manage the containers and make sure that they are always up and available and there is no downtime, this is where Kubernetes comes into play and help scale your containarized apps. It can offer load-balancing, self-healing, resource utilization in the best possible way. Do give the official Kubernetes docs a visit to get a deeper understanding of how it works.
When you deploy Kubernetes you get a cluster and every cluster will have at least one worker node.
A few terminologies to keep in mind when using Kubernetes.
- Node – A node is basically a worker machine that runs a containerized app.
- POD – The worker nodes host POD’s and a POD is a group of one or more containers, where each POD has a unique IP-address and a namespace.
- YAML – A file for configuring Kubernetes.
- Deployment – Specify the number of replicas of a POD you want to run. Deployment ensures then that the replicas are up and running in the cluster.
- Service – An abstraction to define a policy on how to access the POD’s, which is connected to the deployment.
For this, you first need to set up an account on Google Cloud. Once you have an account set up, install the google-sdk from this link. Next head over to Google Cloud Console and create a new project I have called mine tensorflow-training.
Now head over to the terminal and run gcloud init
this will ask you a couple of straight-forward questions and once you are done with this you will be authenticated with your google cloud account.
Next, run gcloud config set project [your-project-id]
to set the project you just created above. To get the project id you can go to the gcp console and from the dropdown next to the search bar you can select the project-id.
Now we are all ready and set-up to create a Kubernetes Engine for the service deployment. For this follow the below steps.
- Enable the Kubernetes Engine Api in cloud console by going into API & Services -> Enable API -> Kubernetes Engine Api
gcloud container clusters create mnist-digit-cluster - num-nodes 2 --zone <specify-the-zone>
- Next we will set the default cluster for gcloud container command by running
gcloud config set container/cluster mnist-digit-cluster
- Now we will pass cluster credentials to kubectl
gcloud container clusters get-credentials mnist-digit-cluster -- zone <specify-the-zone>
After this, we will upload our docker image to the Google Container Registry so that we are able to run it on the GCP.
- First we will tag our image using Container Registry format and our project name by running
docker tag <container-name> gcr.io/<project-id>/<image-name>
- Note the image-name above can be different from the image name on the local-machine.
- Next we configure docker to use gcloud as credential-helper by running
gcloud auth configure-docker
- Now we are all set to push our docker image to the registry.
docker push gcr.io/<project-id>/<image-name
We are almost there, now we need to create a yaml config file for creating a deployment, so head into your favorite text editor and paste the following content in a file and save on your local-disk with .yaml extension. Please update the yaml file with your project-id
and image-name
and I also suggest that you look into the available options for configuring.
apiVersion: apps/v1
kind: Deployment
metadata:
name: mnist-deployment
spec:
replicas: 3
selector:
matchLabels:
app: mnist-server
template:
metadata:
labels:
app: mnist-server
spec:
containers:
- name: <image-name>
image: gcr.io/<project-id>/<image-name>
ports:
- containerPort: 8501
---
apiVersion: v1
kind: Service
metadata:
labels:
run: mnist-service
name: mnist-service
spec:
ports:
- port: 8501
targetPort: 8501
selector:
app: mnist-server
type: LoadBalancer
Now you can run kubectl create -f <path-to-yaml-file>
and if everything goes well you should be notified that both deployment and service were created. You can verify this by running the below commands in terminal
kubectl get deployments
kubectl get services
And now finally we can describe our service using kubectl describe service mnist-service
note down the external ip-address listed next to LoadBalancer Ingress. This is the IP we can now use to query our deployed model from client applications.
The URL structure will be the same as above you just have to replace localhost with the above ip-address and that’s it. To quickly verify you can use the above python or js code-snippet to predict or just go to the below URL in the browser to make sure it all went well.
url = http://{ip}:8501/v1/models/<your-model-name>/versions/1

TF-JS Model
Now it’s time to host our converted tensorflow model in a cloud-storage bucket and load it in React-App using Tensorflow-JS. For this quickly head over to your google cloud console and navigate to storage. Create a bucket choose a name, and specify the location. Once the bucket is created you can either upload your tf-js model from the UI or from the command line by running the below command in the terminal.
Make sure that you have the converted model.json file and the companion group1-shard.bin file.
gsutil cp -r <path-to-tf-js-model-dir> gs://<bucket-name>/
Now that we have uploaded the model json file in the bucket we need to make the files public to be accessible for the client-apps and enable cors on the created bucket. For cors please read here for a detailed explanation.
Make the files public by navigating to the files in the bucket and click Edit Permissions where you can add an entry for Public. Do this for both model.json
& group1-shard1of1.bin
file. Once marked public, copy the public-url for model.json file which will be needed for loading the model.

For enabling cors on the bucket head over to google console and active Google Cloud Shell.

Once you are in the shell type
//Allowing every domain to access your bucket.
echo '[{"origin": ["*"],"responseHeader": ["Content-Type"],"method": ["GET", "HEAD"],"maxAgeSeconds": 3600}]' > cors-config.json
gsutil cors set cors-config.json gs://<bucket-name>
Now we are all set to load the model in our client-app using tensorflow-js.
To explore the source-code head over to the github-repo.
Here’s our deployed model in action.
Happy Coding.