How to do rapid prototyping with Flask, uWSGI, NGINX, and Docker on OpenShift

Published in

Towards Data Science

11 min readSep 10, 2018

This post will detail the technical aspects (with reference code) of getting your prototype into a Docker container which can run on OpenShift using an arbitrary user id.

If you’re like me, you like to dive in headfirst, then figure out how it works. For that, feel free to look at the source code and image then come back to understand how it works.

For many data science prototypes I throw together, they typically fall under a basic Flask application. Personally, I prefer Flask over Django, for the same reason I prefer individual LEGO brick over BURP and LURP bricks when designing a LEGO sculpture. This blog on Flask vs Django does a good job walking through the differences of web frameworks if you’re interested.

LEGO Brick, which do you prefer using when building?

To get a prototype up and running in an environment where we can solicit feedback from the stakeholders interested in the prototypes output, we need a webserver. For that, we need to run the Flask application inside a real web server on the network. Enter NGINX and uWSGI! Other posts have detailed how these three components work together to service your webapp, we’re just going to explore how to get it running inside Docker and OpenShift.

A Guide to Scaling Machine Learning Models in Production

There are many Docker tutorials out there, but not many of them follow the Docker best practices. To get a Docker image running within a Kubernetes environment like OpenShift, there are potentially more strict best practices to follow. The main best practice I ran into recently, is that your container should launch as a “non-root” (or arbitrary) user!

For now, lets just look at the bare minimum needed to get your prototype running in this type of environment. For this we’re going to define a bunch of configuration files, then merge them all together into a single Docker image.

Project Layout

$ tree .
.
├── deploy.sh
├── deployment
│   ├── docker-entrypoint.sh
│   ├── Dockerfile
│   ├── nginx.conf
│   ├── supervisord.conf
│   └── uwsgi.ini
└── src
    ├── __init__.py
    ├── static
    │   └── index.html
    └── wsgi.py

The above structure pushes all the container information into the “deployment” folder, while all they Python code and assets fall into the “src” folder as a pseudo module. Think of your data science code living under src, and defining the endpoints within the wsgi.py file. In a future post, I’ll cover how to standardize project layout with cookiecutter.

In this structure, to run Flask in debug mode, we can simply execute the following from a command line

$ python ./src/wsgi.py

You should do all your local validation testing that your Flask application can run and operate as expected using the above command before attempting to place it inside uWSGI, NGINX, and a Docker image.

uWSGI Config

For this post, we want to run our Flask application inside NGINX and uWSGI on OpenShift. So we’ll need to first configure uWSGI where it can locate the Flask application. This is done in the ./deployment/uwsgi.ini configuration file.

[uwsgi]
chdir=/opt/repo/src
chdir2=/opt/repo/src
master = true
module=wsgi
callable=app
buffer-size=65535
lazy=true
socket = /run/uwsgi.sock

Here we’re defining the standard uWSGI Options. The main options to pay attention to are module and callable. These are the two parameters which instruct the uWSGI service where to look for the Python code to execute. In our case, we’re telling it to look in the “/opt/repo/src” folder for the “wsgi.py” file, and the “app” variable within that file (which is our main Flask callable variable). We’re also explicitly specifying where the socket file is located for this service, which we’ll need to match with the NGINX configuration.

NGINX Config

Next, we’ll need to tell NGINX where it can locate the uWSGI socket file, so when new requests come in on the port it’s listening to, it knows how to route the requests accordingly. This is done in the ./deployment/nginx.conf file.

pid /run/nginx.pid;
error_log /var/log/nginx/error.log;

events {
    worker_connections 1024;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;
    sendfile on;
    tcp_nopush on;

    client_body_temp_path /spool/nginx/client_temp 1 2;
    fastcgi_temp_path /spool/nginx/fastcgi_temp 1 2;
    proxy_temp_path /spool/nginx/proxy_temp 1 2;
    scgi_temp_path /spool/nginx/scgi_temp 1 2;
    uwsgi_temp_path /spool/nginx/uwsgi_temp 1 2;

    server {
        listen 8080;
        server_name localhost;

        access_log /var/log/nginx/access.log;

        location / {
            try_files $uri @app;
        }
        location @app {
            include uwsgi_params;
            uwsgi_pass unix:///run/uwsgi.sock;
        }
        location /static {
            alias /opt/repo/src/static;
            expires 1d;
        }
    }
}

Here we’re being explicit in defining where the NGINX service should attempt to create any logs, temp files, and what ports to listen on. Because OpenShift will be launching this image as an arbitrary user, we cannot use any port numbers below 1024 (i.e. privileged ports such as the standard HTTP 80 or HTTPS 443). So we tell the service to listen on port 8080. Then within the location routing, we’re telling NGINX to route all requests to the uWSGI socket and Python application (i.e. “@app”), except for any static files, which should go directly to that folder on disk.

Supervisord Config

One last thing we need to configure is a method to get all of this to run within a single image. Docker best practices strongly suggest one application per image, but in our case we need uWSGI to run our Python code, and NGINX to route to uWSGI. So we’ll use a trick of launching Supervisord to handle the multiple concurrent services. This is done in the ./deployment/supervisord.conf file.

[unix_http_server]
file=/run/supervisor.sock
chmod=0770

[supervisord]
nodaemon=true
pidfile=/run/pid/supervisord.pid
logfile=/var/log/supervisor/supervisord.log
childlogdir=/var/log/supervisor
logfile_maxbytes=50MB
logfile_backups=1

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///run/supervisor.sock

[program:nginx]
command=/usr/sbin/nginx -g "daemon off;" -c /etc/nginx/nginx.conf
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

[program:uwsgi]
command=/usr/local/bin/uwsgi --ini /etc/uwsgi/apps-enabled/uwsgi.ini
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0

Here we’re being explicit with which commands to execute for each “service”, along with the location to the configuration files we detailed earlier. Typically, you can launch supervisord or systemd as root, and switch user to execute specific services. However, with the arbitrary user id on OpenShift, we need to allow these services to launch as any user in the “root” group. This is why we don’t specify any user parameters in the configuration file, and route the logs to /dev/stdout (which will allow them to show up in the Docker log files when the image is running).

Docker ENTRYPOINT

With the configuration files all set, we just need to tell Docker what to execute when an image is ran. The use of the arbitrary user id with some Python applications throws a wrench into these plans. This problem is documented extensively in Issue 10496.

The good news is that there’s a simple workaround. Each time our Docker image is ran, we just need to add a check to verify the arbitrary user has an entry in the /etc/passwd file. This is done in the ./deployment/docker-entrypoint.sh file.

#!/bin/bash
set -e

# if the running user is an Arbitrary User ID
if ! whoami &> /dev/null; then
  # make sure we have read/write access to /etc/passwd
  if [ -w /etc/passwd ]; then
    # write a line in /etc/passwd for the Arbitrary User ID in the 'root' group
    echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd
  fi
fiif [ "$1" = 'supervisord' ]; then
    exec /usr/bin/supervisord
fi


exec "$@"

However, in order for this to work correctly, we need to setup the Docker image to allow all users with write access to the /etc/passwd file. Once that’s in place, we specify a second condition to catch when to execute the “supervisord” application (which will in turn execute uWSGI and NGINX) as the arbitrary user id and pipe all logs to /dev/stdout.

Dockerfile

The final step in deploying this prototype is to tell Docker how to build and configure the image with the configuration files we specified above. The best part is that once we do this once, we can re-use this structure/image for future iterations of multiple projects! This is done in the ./deployment/Dockerfile file.

# Use the standard Nginx image from Docker Hub
FROM nginx

ENV HOME=/opt/repo

# install python, uwsgi, and supervisord
RUN apt-get update && apt-get install -y supervisor uwsgi python python-pip procps vim && \
    /usr/bin/pip install uwsgi==2.0.17 flask==1.0.2

# Source code file
COPY ./src ${HOME}/src

# Copy the configuration file from the current directory and paste 
# it inside the container to use it as Nginx's default config.
COPY ./deployment/nginx.conf /etc/nginx/nginx.conf

# setup NGINX config
RUN mkdir -p /spool/nginx /run/pid && \
    chmod -R 777 /var/log/nginx /var/cache/nginx /etc/nginx /var/run /run /run/pid /spool/nginx && \
    chgrp -R 0 /var/log/nginx /var/cache/nginx /etc/nginx /var/run /run /run/pid /spool/nginx && \
    chmod -R g+rwX /var/log/nginx /var/cache/nginx /etc/nginx /var/run /run /run/pid /spool/nginx && \
    rm /etc/nginx/conf.d/default.conf

# Copy the base uWSGI ini file to enable default dynamic uwsgi process number
COPY ./deployment/uwsgi.ini /etc/uwsgi/apps-available/uwsgi.ini
RUN ln -s /etc/uwsgi/apps-available/uwsgi.ini /etc/uwsgi/apps-enabled/uwsgi.ini

COPY ./deployment/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
RUN touch /var/log/supervisor/supervisord.log

EXPOSE 8080:8080

# setup entrypoint
COPY ./deployment/docker-entrypoint.sh /usr/local/bin/docker-entrypoint.sh

# https://github.com/moby/moby/issues/31243#issuecomment-406879017
RUN ln -s /usr/local/bin/docker-entrypoint.sh / && \
    chmod 777 /usr/local/bin/docker-entrypoint.sh && \
    chgrp -R 0 /usr/local/bin/docker-entrypoint.sh && \
    chown -R nginx:root /usr/local/bin/docker-entrypoint.sh

# https://docs.openshift.com/container-platform/3.3/creating_images/guidelines.html
RUN chgrp -R 0 /var/log /var/cache /run/pid /spool/nginx /var/run /run /tmp /etc/uwsgi /etc/nginx && \
    chmod -R g+rwX /var/log /var/cache /run/pid /spool/nginx /var/run /run /tmp /etc/uwsgi /etc/nginx && \
    chown -R nginx:root ${HOME} && \
    chmod -R 777 ${HOME} /etc/passwd

# enter
WORKDIR ${HOME}
ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["supervisord"]

This post isn’t intended to walk through each line of the Dockerfile. Just know that we’re starting from the official NGINX build (which has NGINX installed already), adding a few Python packages via PIP, and explicitly setting the permissions on all the folders NGINX, uWSGI, and Supervisord would need to touch during execution so the arbitrary user id in the “root” group has the permissions it needs. Finally, we’re telling the image to look at the “docker-entrypoint.sh” file to launch “supervisord” on each image run by default.

Building Docker image

To pull all the above building blocks together, we simply need to execute a build of the Docker image. This can be accomplished on your local machine with:

$ cd {{project root directory}}
$ docker build -f ./deployment/Dockerfile -t prototype:latest .

In the above example, we need to execute the build from the root directory so the build context has access to both the ./deployment and ./src folders.

Testing Docker image as arbitrary user id

It’s one thing to successfully build your Docker image, it’s quite another to get it to run within OpenShift running as an arbitrary user id. The good news is that we can test this on our local machine with the user flag

$ docker run -p 8080:8080 -u 112233 prototype:latest

In the above, we picked the arbitrary user id of “112233”, but it doesn’t matter what number is used. You should be able to change this to any numeric value, and your image should still run correctly (hence the “arbitrary” in OpenShift’s arbitrary user id).

Also, we’re routing port 8080 on our local machine to the NGINX service within the container, meaning we should be able to open a web browser on our local machine and view our simple prototype at these endpoints:

http://localhost:8080/static/index.html : Load a static HTML page from NGINX
http://localhost:8080/ : Load the generic home endpoint from Flask
http://localhost:8080/echo_request : Echo the request header from within Flask back to the caller

Docker image troubleshooting

If the above doesn’t work, you’ll need to debug locally to figure out what permissions your application needs, and modify the Dockerfile accordingly. To debug an image, you can overwrite the entrypoint command with:

$ docker run -it -p 8080:8080 -u 0 prototype:latest /bin/bash

Which will drop you into an interactive bash command prompt as root for you to dig around locally to troubleshoot inside the image. You may also need to switch the user id used in the -u argument. Use CTRL+D or exit to terminate the image.

To test execution of supervisord, within the bash command prompt, you can execute the following to watch the logs to identify what the problem may be.

$ supervisorctl [start|stop|restart] nginx # NGINX service
$ supervisorctl [start|stop|restart] uwsgi # uWSGI service

Sometimes your build state will get “dirty” with a bunch of images you no longer need, consuming disk space for the Docker daemon. To clean this up, you can run:

$ docker system prune

Deploying to a container repository

After we’ve modified the arbitrary user id a few times, and are confident in the execution of our single image prototype, it’s time to push it to a container repository for deployment.

There are multiple methods to build and deploy images in OpenShift. We’re not going to get into deployment pipelines in this post. Rather, all we need to do is push our image to a container repository (like Docker Hub), then instruct OpenShift to pull that image and deploy it.

First we need to authenticate to our container repository. Where yourhubusername is your username on the container repository, and youremail@company.com is your email address specified on the container repository.

$ docker login --username=yourhubusername --email=youremail@example.com

Then build/tag/push our image to that container repository. Where yourhubusername is your username on the container repository which you authenticated to.

$ docker build -f ./deployment/Dockerfile -t prototype:latest . 
$ docker tag $(docker images | grep ^prototype |awk '{print $3}') yourhubusername/prototype:latest
$ docker push yourhubusername/prototype:latest

Now the image should be on your container repository! If you’re using the public Docker Hub, you can navigate to your repositories at this URL (after authenticating), and see your new image:

https://hub.docker.com/

Deploying to OpenShift

Next, lets tell OpenShift to make a deployment from that image. For this, we’re going to be using OpenShifts CLI tool. To start, we’ll need to authenticate to our OpenShift server. Simply replace the URL with your OpenShift server, and <MY_TOKEN> with your OpenShift token.

$ oc login https://my.openshift-servername.com --token=<MY_TOKEN>

For creating a deployment from the CLI, we just need to tell OpenShift where to locate our Docker image. For details on the OpenShift configuration, visit their How Deployments Work page. Essentially, just point it to the image on the container repository we just created.

$ oc new-app yourhubusername/prototype

Next, we’ll need to tell OpenShift to add a route so our web traffic can reach our image. In the below, we’re telling it to route traffic destined to “prototype.example.com” to use the “prototype” service on port TCP 8080. This effectively tells OpenShift how to route traffic to our NGINX image we just created.

$ oc create route edge --service=prototype --hostname=prototype.example.com --port=8080-tcp

Now, you should be able to navigate to the hostname:port combination to view your app running on OpenShift as an arbitrary user!

Magic Iteration Time

Now that we’ve gone through all that configuration and building, we can rapidly iterate on the prototype! Because our Dockerfile simply copied over our Python code from the ./src folder, to update our image, we just have to make sure our new code is inside the ./src folder and test locally with our debug Flask command:

$ python ./src/wsgi.py

Once we’re happy with the new functionality, we can build and push the image with a simple ./deploy.sh command:

$ bash ./deploy.sh yourhubusername prototype latest

Once the command has finished, your URL in OpenShift will perform a rolling update to the service, and expose your new functionality in your prototype in as little as 5 minutes!

Source Code Repo

An example of the code and configuration referenced in this post is on my GitHub page here.