Machine Learning Prediction in Real Time Using Docker and Python REST APIs with Flask

Xavier Vasques
Towards Data Science
6 min readMay 1, 2021

--

A quick example of a Docker container and REST APIs to perform online inference

The idea of this article is to do a quick and easy build of a Docker container to perform online inference with trained machine learning models using Python APIs with Flask. Before reading this article, do not hesitate to read Why use Docker for Machine Learning, Quick Install and First Use of Docker, and Build and Run a Docker Container for your Machine Learning Model in which we learn how to use Docker to perform model training and batch inference.

Batch inference is great when you have time to compute your predictions. Let’s imagine you need real time predictions. In this case, batch inference is not more suitable and we need online inference. Many applications would not work or would not be very useful without online predictions such as autonomous vehicles, fraud detection, high-frequency trading, applications based on localization data, object recognition and tracking or brain computer interfaces. Sometimes, the prediction needs to be provided in milliseconds.

To learn this concept, we will implement online inferences (Linear Discriminant Analysis and Multi-layer Perceptron Neural Network models) with Docker and Flask-RESTful.

To start, let’s consider the following files: Dockerfile, train.py, api.py, requirements.txt, train.csv, test.json.

The train.py is a python script that ingest and normalize EEG data and train two models to classify the data. The Dockerfile will be used to build our Docker image, requirements.txt (flask, flask-restful, joblib) is for the Python dependencies and api.py is the script that will be called to perform the online inference using REST APIs. train.csv are the data used to train our models, and test.json is a file containing new EEG data that will be used with our inference models.

You can find all files on GitHub.

Flask-RESTful APIs

The first step in building APIs is to think about the data we want to handle, how we want to handle it and what output we want with our APIs. In our example, we will use the test.json file in which we have 1300 rows of EEG data with 160 features each (columns). We want our APIs to the following:

- API 1: We will give a row number to the API which will extract for us the data from the selected row and print it.

- API 2: We will give a row number to the API which will extract the selected row, inject the new data into the models and retrieve the classification prediction (Letter variable in the data)

- API 3: We will ask the API to take all the data in the test.json file and instantly print us the classification score of the models.

At the end, we want to access those processes by making an HTTP request.

Let’s have a look at the api.py file:

The first step, after importing dependencies including the open source web microframework Flask, is to set the environment variables that are written in the Dockerfile. We also need to load our Linear Discriminant Analysis and Multi-layer Perceptron Neural Network serialized models. We create our Flask application by writing app = Flask(__name__). Then, we create our three Flask routes so that we can serve HTTP traffic on that route:

- http://0.0.0.0:5000/line/250: Get data from test.json and return the requested row defined by the variable Line (in this example we want to extract the data of row number 250)

- http://0.0.0.0:5000/prediction/51: Returns classification prediction from both LDA and Neural Network trained models by injecting the requested data (in this example, we want to inject the data of row number 51)

- http://0.0.0.0:5000/score: Return classification score for both the Neural Network and LDA inference models on all the available data (test.json).

The Flask routes allows us to request what we need from the API by adding the name of our procedure (/line/<Line>, /prediction/<int:Line>, /score) to the URL (http://0.0.0.0:5000). Whatever the data we add, api.py will always return the output we request.

Machine Learning models

The train.py is a python script that ingests and normalizes EEG data in a csv file (train.csv) and train two models to classify the data (using scikit-learn). The script saves two models: Linear Discriminant Analysis (clf_lda) and Neural Networks multi-layer perceptron (clf_NN):

Docker Image for the Online Inference

We have all to build our Docker image. To start, we need our Dockerfile with the jupyter/scipy-notebook image as our base image. We also need to set our environment variables and install joblib to allow serialization and deserialization of our trained models and flask (requirements.txt). We copy the train.csv, test.json, train.py and api.py files into the image. Then, we run train.py which will fit and serialize the machine learning models as part of our image build process.

Here it is:

To build this image, run the following command:

docker build -t my-docker-api -f Dockerfile . 

We obtain the following output:

Running Docker Online Inference

Now the goal is to run our online inference meaning that each time a client issues a POST request to the /line/<Line>, /prediction/<Line>, /score endpoints, we will show the requested data (row), predict the class of the data we inject using our pre-trained models, and the score of our pre-trained models using all the available data. To launch the web server, we will run a Docker container and run the api.py script:

docker run -it -p 5000:5000 my-docker-api python3 api.py

The -p flag exposes port 5000 in the container to port 5000 on our host machine, -it flag allows us to see the logs from the container and we run python3 api.py in the my-api image.

The output is the following:

You can see that we are running on http://0.0.0.0:5000/ and we can now use our web browser or the curl command to issue a POST request to the IP address.

If we type:

curl http://0.0.0.0:5000/line/232

We will get row number 232 extracted from our data (test.json):

Same result using the web browser:

If we type the following curl command:

curl http://0.0.0.0:5000/prediction/232

We will see the following output:

The above output means that the LDA model classified the provided data (row 232) as letter 21 (U) while Multi-layer Perceptron Neural Network classified the data as letter 8 (H). The two models do not agree.

If we type

curl http://0.0.0.0:5000/score

We will see the score of our models on the entire dataset:

As we can read, we should trust more the Multi-layer Perceptron Neural Network with an accuracy score of 0.59 even if the score is not so high. Some work to do to improve the accuracy!

What’s Next?

I hope you can see the simplicity of containerizing your machine/deep learning applications using Docker and flask to perform online inference. This is an essential step when we want to put our models into production. Of course, it’s a simple view as we need to take into account many more aspects such as the network, security, monitoring, infrastructure, orchestration or add a database to store the data instead of using a json file.

--

--

CTO and Distinguished Data Scientist, IBM Technology, France Head of Clinical Neurosciences Research Laboratory, France