Machine Learning Models as Micro Services in Docker

Published in

Towards Data Science

6 min readMar 17, 2019

One of the biggest underrated challenges in machine learning development is the deployment of the trained models in production that too in a scalable way. One joke on it I have read is “Most common way, Machine Learning gets deployed today is powerpoint slides :)”.

Why Docker?

Docker is a containerization platform which packages an application & all its dependencies into a container.

Activating this container results in the application being active.

Docker is used when you have a lot of services which work in an isolated manner and serve as a data provider to a web application. Depending on the load, the instances can be spun off on demand on the basis of the rules set up.

Why Docker for Machine Learning models?

Production deployment of regular software applications is hard. If that software is a Machine Learning pipeline, it’s worse! And in today’s scenario, you can’t get away from machine learning, as it is the most competitive edge you can get in the business. In production, a Machine Learning powered application would be using several models for several purposes. Some major practical challenges in Machine Learning models deployment that can be handled through docker are:

Ununiform environments across models.

There can be cases where for one model you need LANG_LEVEL set to ‘c’ while for another LANG_LEVEL should be ‘en_us.UTF-8’. Put different models in different containers so that isolated environments for different models will be obtained.

2. Ununiform library requirements across models.

You have developed a text summarizer using tensorflow 1.10. Now we want to have a sentiment analysis using transfer learning which is supported by tensorflow2.0(suppose). Putting them in different containers will not break the app.

Another major use case is, you develop ML models in python. But the application you want to make in Go language (for some technical advantages), then exposing the ml model through docker to the app will solve it.

3. Ununiform resource requirements across models.

You have a very complex object detection model which requires GPU, and you have 5 different neural networks for other purposes which are good to run on CPU. Then on deploying the models in containers, you get the flexibility of assigning resources as per requirement.

4. Ununiform traffics across models.

Suppose you have a question identifier model and answer generation mode.w The former is called frequently while the latter one is not called that frequent. Then you need more instances of question identifier than answer generator. This can be easily handled by docker.

Another scenario is, at 10 am you have 10000 requests for your model whereas at 8 pm it is only 100. So you need to spin off more serving instances as per your requirements, which is easier in docker.

5. Scaling at model level

Suppose you have a statistical model which serves 100000 requests per second, whereas a deep learning model capable of serving 100 requests per second. Then for 10000 requests, you need to scale up only the deep learning model. This can be done by docker.

Now let’s see how to create a container of a deep learning model. Here the model I have built is a question topic identifier on the question classifier dataset available at http://cogcomp.org/Data/QA/QC/. Google’s Universal Sentence Encoder is used for word embedding.

While creating a container for a model, the workflow normally has to be followed is:

Build and train the model.
Create an API of the model. (Here we have put it in a flask API).
Create the requirements file containing all the required libraries.
Create the docker file with necessary environment setup and start-up operations.
Build the docker image.
Now run the container and dance as you are done :)

sambit9238/QuestionTopicAnalysis

Finding the question topic plays a major role in various use-cases starting from question answering system to answer…

github.com

Build and train the model.

To build and train the model, a basic workflow is to get the data, do the cleaning and processing of the data and then fed the data to the model architecture to get a trained model.

For example, I have built a question intent classifier model on the TREC dataset available at http://cogcomp.org/Data/QA/QC/. The training data has 6 intents with the number of instances of each is as follows:

Counter({'DESC': 1162,
         'ENTY': 1250,
         'ABBR': 86,
         'HUM': 1223,
         'NUM': 896,
         'LOC': 835})

The model creation can be seen at https://github.com/sambit9238/QuestionTopicAnalysis/blob/master/question_topic.ipynb

The processing steps followed here are:

Dealing with contractions like I‘ll, I‘ve etc.
Dealing with hyperlinks, mail addresses etc.
Dealing with numbers and ids.
Dealing with punctuations.

For embedding, Google's universal sentence encoder is used from tensorflow-hub.

The model architecture followed is a neural network with 2 hidden layers each with 256 neurons. To avoid overfitting, L2 regularization is used.

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 1)                 0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_2 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_3 (Dense)              (None, 6)                 1542      
=================================================================
Total params: 198,662
Trainable params: 198,662
Non-trainable params: 0
_________________________________

The model is stored in .h5 file for reuse. Label encoder is stored in the pickle file for reuse.

Create an API of the model. (Here we have put it in a flask API).

The stored model is put in Flask api so that it can be used in production (https://github.com/sambit9238/QuestionTopicAnalysis/blob/master/docker_question_topic/app.py.)

The API expects a list of texts, as multiple sentences will come while using in real time. It goes through cleaning and processing to fed for prediction. The predicted results are scaled to represent the confidence percentage of each intent. The scaled results are then sent in JSON format.

For example,

input: [ “What is your salary?”]

output: {‘ABBR’: 0.0012655753, ‘DESC’: 0.0079659065, ‘ENTY’: 0.011016952, ‘HUM’: 0.028764706, ‘LOC’: 0.013653239, ‘NUM’: 0.93733364}

That means the model is 93% confident that the answer should be a number for this question.

Create the requirements file containing all the required libraries.

To create a Docker image to serve our API, we need to create a requirement file with all the used libraries along with their versions.

Create the Dockerfile with necessary environment setup and start-up operations.

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

For our example, the pre-built python-3.6 image is taken as a base image. Then the pre-trained Universal Sentence Encoder model files have been downloaded followed by the installation of required libraries. The 5000 port of docker is exposed, this is the port where flask app will run as it is in the default configuration.

Build the docker image. and run the container

Now we have the Dockerfile, flask API and trained model files in a directory. Hence we need to create the docker image out of it. The command for that can be:

docker build -t question_topic .
#the last option is location of the directory. Since I am in the directory so '.' is put, which represents current directory in unix.

Now the docker image is created, we need to run the image in a container docker run -p 8888:5000 --name question_topic question_topic

It will make the created docker image run. The port 5000 in the docker is mapped to 8888 port of host machine. Hence, the API will receive and response requests at port 8888. If you want to run the docker in the background and detach it from the command prompt (which will be the case in real time), run it with ‘-d’ option.

To check the outputs of the docker let’s send a post request using curl.

input:

To supply the input, curl — request POST
— url http://0.0.0.0:8888/predict_topic
— header ‘content-type: application/json’
— data ‘{“rawtext_list”:[“Where do you work now?”, “What is your salary?”]}’

output:

{

“input”: “[‘Where do you work now?’, ‘What is your salary?’]”,

“output”: “[ {‘ABBR’: 0.0033528977, ‘DESC’: 0.0013749895, ‘ENTY’: 0.0068545835, ‘HUM’: 0.7283039, ‘LOC’: 0.25804028, ‘NUM’: 0.0020733867},

{‘ABBR’: 0.0012655753, ‘DESC’: 0.0079659065, ‘ENTY’: 0.011016952, ‘HUM’: 0.028764706, ‘LOC’: 0.013653239, ‘NUM’: 0.93733364} ]”

}

It seems the docker is running fine :)

Notes:

The mentioned example is not production ready. But it can be production ready by following few things like:

At Model level, more processing and model hyper-parameters’ tuning can be made to make model better.
At API level, initialization of the session and variables need not to be done at each request. It can be done once at the app startup by drastically reducing the response time of API.
At Docker level, The Docker file is not optimized, hence docker image can be bigger than required in terms of size.

Machine Learning Models as Micro Services in Docker

Why Docker?

Why Docker for Machine Learning models?

sambit9238/QuestionTopicAnalysis

Finding the question topic plays a major role in various use-cases starting from question answering system to answer…

Written by Sambit Mahapatra