Deploy Machine Learning Web API using NGINX and Docker on Kubernetes in Python

Published in

Towards Data Science

7 min readOct 21, 2019

A few days back, I was working on deploying machine learning prediction web API on Google Cloud Platform (GCP) Kubernetes engine and went through a lot of technical blogs to accomplish my task. So, I thought to write step by step guide to deploy any machine learning prediction web API using Flask, Gunicorn, NGINX containerized in Docker on GCP Kubernetes engine.

I divided this step-by-step tutorial into 3 sections:

In section 1, we will briefly discuss on components such as Flask, Gunicorn, NGINX, Docker and Kubernetes, project scaffolding and set up on development machine.
In section 2, we will discuss on set up of Gunicorn, supervisor, NGINX, creation of Dockerfile, and Docker image.
In section 3, we will discuss on Kubernetes and deploy web API on Google Kubernetes Engine.

Section 1:

Components description

I assume that the reader has basic knowledge on Python, Flask, Gunicorn, NGINX, Docker, Kubernetes, last but not the least Machine Learning. Below are basic introduction for the components used in this tutorial, for more details please refer the embedded links:

Flask is a micro web framework written in Python. It is classified as a micro-framework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
Gunicorn is one of many WSGI server implementations, commonly-used part of web app deployments that’s powered some of the largest Python-powered web applications in the world, such as Instagram.
NGINX is commonly used as a web server to serve static assets such as images, CSS and JavaScript to web browser clients. NGINX is also typically configured as a reverse proxy, which passes appropriate incoming HTTP requests to a WSGI server. The WSGI server produces dynamic content by running Python code. When the WSGI server passes its response, which is often in the HTML, JSON or XML format, the reverse proxy then responds to the client with that result.
Docker is an open-source project that automates the deployment of software applications inside containers by providing an additional layer of abstraction and automation of OS-level virtualization on Linux.
Kubernetes is an open-source container-orchestration system for automating deployment, scaling, and management of containerized applications. It is a portable, extensible platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.

Project scaffolding

Project folder structure to create a docker image

app folder: The main folder contains all the python code such as:

a. Machine learning logic such as data preprocessing, loading saved model in memory and machine learning model prediction.

b. All the required python package in requirement.txt

c. All the Flask code and routing inside main.py

Dockerfile: defines what goes on in the environment inside your container. Access to resources like networking interfaces and disk drives is virtualized inside this environment, which is isolated from the rest of your system, so you need to map ports to the outside world and be specific about what files you want to “copy in” to that environment. However, after doing that, you can expect that the build of your app defined in this Dockerfile behaves the same wherever it runs.
flask.conf: This configuration file will define one server block for the machine learning flask application.
gunicorn.conf: This configuration file will execute the command to run the Gunicorn application server in the background.
supervisord.conf: This configuration file will look after the Gunicorn process and make sure that they are restarted if anything goes wrong, or to ensure the processes are started at boot time.
Makefile: This makefile will contain all the commands such as create and run a docker image, create the Kubernetes cluster and deploy the web API.

Project set up on the development machine

Install the required package on the development machine:

$ sudo apt-get install -y python python-pip python-virtualenv nginx gunicorn supervisor

Create and activate a Python virtual environment:

Python virtual environment helps us to create an application-specific environment with the required python package.

Install a virtual environment package:

$ pip install virtualenv

Create a virtual environment:

$ virtualenv mypython

Activate the Python virtual environment:

$ source mypython/bin/activate

Create a Flask app for machine learning prediction:

Install Flask and other dependencies:

$ pip install Flask
$ pip install -r requirements.txt

Create machine learning inference code:

Flask is mainly created for an application development server. We can test the Flask API on the development machine.

$ python main.py

It is not recommended to use the Flask development server in the production environment to handle concurrency and security. We will use Gunicorn as a Python HTTP WSGI server gateway interface.

Section 2

Install the Gunicorn Python package:

$ pip install gunicorn

Configure gunicorn.conf file to configure Gunicorn web server.

We have configured Gunicorn web server to listen on port 5000 and run the main.py file from the app directory.

Configure supervisord as monitoring process:

Supervisord allows a user to monitor and control several processes on UNIX-like operating systems. Supervisor will look after the Gunicorn process and make sure that they are restarted if anything goes wrong, or to ensure the processes are started at boot time.

It will run NGINX reverse proxy server and keep a monitor on it. If anything fails it will automatically restart the server and run NGINX command.

Set up NGINX server:

Open up a server block and set NGINX to listen on the default port 80. Users can set the server name to handle the request.

Create Dockerfile:

I have divided Dockerfile into 5 sections. Let’s go briefly into each section:

Create an Ubuntu environment. Install Ubuntu and update the necessary package. Install python, pip, virtual environment, NGINX, Gunicorn, and supervisor.
Set up a flask application. Make a directory “/deploy.app”. Copy all files from the “app” folder to “/deploy/app” folder. Install all the required python packages.
Set up NGINX. Remove default NGINX configuration. Copy flask configuration to NGINX configuration. Create a symbolic link for NGINX flask configuration.
Set up supervisord. Make a directory “/var/log/supervisor”. Copy Gunicorn and supervisord configuration to the newly-created directory.
Start supervisord process for monitoring.

Build and run docker image to test the production-ready ML web API:

I have created a makefile to run all the commands.

The below Makefile performs two operations:

Create a docker image
Run docker image on port 80

Run Makefile:

deploy-local:#build docker image
docker build -t gcr.io/${project_id}/${image_name}:${version} .#run docker image
docker run -d --name ${image_name} -p $(port):$(port) gcr.io/${project_id}/${image_name}:${version}

Test production-ready web API on development machine using Postman on URL: http://127.0.0.1/predict

In the above section, we have learned how to to build and run an ML web API docker image on the development machine.

Section 3

In this section, we will learn how to deploy this NGINX containerized in Docker on the GCP Kubernetes engine and make API public for the production environment.

Build the container image of machine learning application and tag it for uploading

#Set variables value as per your project
$ docker build -t gcr.io/${project_id}/${image_name}:${version} .

2. Using the gcloud command-line tool, install the Kubernetes CLI. Kubectl is used to communicate with Kubernetes, which is the cluster orchestration system of GKE clusters

$ gcloud components install kubectl

3. Using the gcloud command-line tool, install the Kubernetes CLI. Kubectl is used to communicate with Kubernetes, which is the cluster orchestration system of GKE clusters

$ gcloud components install kubectl

4. Configure Docker command-line tool to authenticate to Container Registry

$ gcloud auth configure-docker

5. Use the Docker command-line tool to upload the image to your Container Registry

#Set variables value as per your project
$ docker push gcr.io/${project_id}/${image_name}:${version}

6. Use gcloud command-line tool and set the project id

$ gcloud config set project ${project_id}

7. Use gcloud command-line tool and set the zone

$ gcloud config set compute/zone ${zone}

8. Create a one-node Kubernetes cluster named ML-API-cluster-nginx on GCP

$ gcloud container ${cluster_name} create machine-learning-api --num-nodes=1

9. Deploy machine learning application, listening on port 80

$ kubectl run ${image_name} --image=gcr.io/${project_id}/${image_name}:v1 --port 80

10. Expose machine learning application to traffic from the Internet

$ kubectl expose deployment ${image_name} --type=LoadBalancer --port $(port) --target-port $(port)

Test the deployed Machine Learning Web API using curl command or Postman.

I hope this will give you an illustrative guide to deploy Machine Learning Web API using NGINX and Docker on Kubernetes in Python.

Sources:

Happy Learning!!!