The world’s leading publication for data science, AI, and ML professionals.

Putting Your Models Into Production

A guide to getting your deep learning model into production with TensorFlow Serving.

Source (Unsplash)
Source (Unsplash)

You’ve been slaving away for an innumerable number of hours trying to get your model just right. You’ve diligently cleaned your data, painstakingly engineered features, and tuned your hyperparameters to the best of your ability. Everything has finally fallen into place and you’re now ready to present your model to the world. There’s only one problem: your model lies trapped in your local machine with no access to the outside world.

Such is the fate of most Machine Learning models. In fact, around 87% of them never make it into production. A disproportionate amount of resources (not to mention hype) go into the actual model building process but in reality, this only takes up around 14% of a data scientist’s time. An often overlooked step is the actual deployment of these models into production. This article will look at Tensorflow Serving, a system for deploying TensorFlow models into production.

TensorFlow Serving

The easiest way to use TensorFlow Serving is by using a Docker image. Docker is a platform for developing, shipping, and running applications; making it possible to deliver content quickly with little setup. If you have Docker installed on your local machine, you can just download the image by running this command in your terminal:

docker pull tensorflow/serving

Otherwise, you can go to the Docker website to get it.

Since we will not be focusing on the model building process, we’ll just create a simple sequential model using Keras that we’ll train on the MNIST dataset.

Model Training

After defining the model, let’s compile it using sparse_categorical_crossentropy as our loss, adam as our optimizer, and accuracy as our metric.

Finally, let’s call the fit method on our model to fit it to our data, training it for 5 epochs and passing in the test images as validation data.

Model training output.
Model training output.

After training our model, we’ll save it to a specified export path on our machine.

Above we created a MODEL_DIR variable as well as a version number to keep track of our model version. If we want to update the version of our model in the future, we can change the version number when we save it. In this case, our model will be saved in a directory named mnist_model/1. If we inspect this directory, we will see a saved_model.pb file and a variables folder.

The saved_model.pb file is a protobuf file that stores the actual Tensorflow model. The variables directory contains a training checkpoint for reloading the saved model.

Running our Docker container

After this, we’ll run a docker command on our machine to serve the model.

A lot is going on here so let’s break this command down further.

  • docker – the command for running docker.
  • run – runs a command in a new container.
  • -p 8501:8501 – run the docker command on a specific port (in our case, port 8501 on the container will be mapped to port 8501 on our host).
  • –mount type=bind,source=…,target=… – **** this mounts the model from our local machine to the container. We specify a source which is the absolute path of the model on our local machine and a target , which is where we want the model to be placed within our container.
  • -e MODEL_NAME – **** here we set an environment variable that we’ll use to call our model when we do REST requests. The model name can be anything you want.
  • -t tensorflow/serving – **** this is where we specify the image that we want to use. Here we use the tensorflow/serving image we downloaded earlier.

Once you run this command, make sure to wait until you see a message like this on your terminal:

2021-04-13 02:25:32.041325: I tensorflow_serving/model_servers/server.cc:391] Exporting HTTP/REST API at:localhost:8501 ...

Making an Inference Request

The container with our model will then be running on port 8501 of our machine. At this point, we are ready to send requests to make predictions on new data. We can write a simple Python script to send POST requests and make predictions.

We create a JSON variable that contains the first 10 images in our test data. After this, we use the requests library to send a post request and get predictions for the data. We then compare the model predictions to the ground truth labels and see that the model is performing as intended. We’ve successfully deployed our model using TensorFlow Serving!

Conclusion

We covered model deployment using TensorFlow Serving. By using a Docker image we made it easy to deploy our model in a container instead of installing TensorFlow serving locally. Once we trained our model and got our container up and running, we used a python script to make post requests to predictions to new data.

Thank you for reading!


You can connect with me through these channels:


Related Articles