Source: https://pixabay.com/photos/vault-business-bank-vault-bank-1144249/

Securing ML Services on the Web

HTTPS and Access Control

Published in

Towards Data Science

15 min readApr 6, 2020

If you’re looking to host a machine learning service over the web, then it’s usually necessary to lock down the endpoint so that calls to the service are secure and only authorized users are able to access the service. In order to make sure that sensitive information is not exposed over the web, we can use secure HTTP (HTTPS) to encrypt communication between clients and the service, and use access control to limit who has access to the endpoint. If you’re building a machine learning service in 2020, you should plan on implementing both secure HTTP and access control for your endpoints.

This post will show how to build a secure endpoint implemented with Flask to host a scikit-learn model. We’ll explore the following approaches:

Enabling HTTPS directly in Flask
Using a WSGI Server (Gunicorn)
Using a secure load balancer (GCP)

We’ll host the service using Docker and Kubernetes in the GCP ecosystem. To restrict access to authorized users, we’ll explore the following approaches for access control to the service:

Token-based authentication (Flask)
OAuth authentication (Dash)
Whitelisting

Depending on how your organization deploys services, some of these options may not be available, but it’s good to get experience with a variety of different approaches for locking down services, and usually it’s a good idea to lock down endpoints using multiple approaches. This post is written from the perspective of hosting web services written in Python, and builds upon my prior post on models as web endpoints.

Models as Web Endpoints

An excerpt from Data Science in Production

towardsdatascience.com

HTTPS for Flask

If you’re planning up hosting a machine learning model over the web, then you should consider requirements for secure transmission of the data and results early on in the project. Chrome started marking HTTP sites as non-secure in mid 2018 and there’s now plenty of tools to enable model endpoints to be secured using HTTPS. Secure HTTP leverages secure socket layer (SSL) to ensure that traffic between clients and servers is encrypted, and uses public key infrastructure (PKI) to ensure that clients are communicating with their intended target. This post focuses on the first aspect, where traffic sent between a client and server is encrypted. To completely set up HTTPS, you’ll need to set up a DNS entry that corresponds to the IP address of your endpoint, so that you can create a signed certificate that identifies your endpoint as a trusted host. This step is straightforward with Google Cloud Platform (GCP) once you have the DNS entry set up, but managing a web domain and DNS lookup is outside the scope of this post.

In general, it’s a best practice to use a system other than Flask to secure an endpoint, because Flask should not be used directly as a web application. Instead, it’s better to use tools such as Gunicorn or Nginx to provide a secure layer on top of a non-secured Flask application. However, sometimes Flask applications, such as interactive web applications built with Dash on top of Flask need to provide a secure connection end-to-end. This is where libraries such as Flask Dance are useful.

To start, we’ll need to install Python and dependent libraries. For this tutorial, we’ll install the following libraries to set up a Flask endpoint, a Dash application, and a client application:

pip3 install --user pandas 
pip3 install --user scikit-learn
pip3 install --user flask
pip3 install --user Flask-HTTPAuth
pip3 install --user requests
pip3 install --user cryptography
pip3 install --user gunicorn
pip3 install --user dash
pip3 install --user flask_dance
pip3 install --user dash-google-auth

We’ll start by building a predictive modeling endpoint in Flask that returns the propensity of a user to buy a new game. The input to the model is a feature vector that describes whether or not the user has previously purchased games in a small catalog. The code is described in more detail in my past post on models as web endpoints. The snippet below shows how to set up a Flask endpoint that first trains a scikit-learn model, sets up an endpoint at “/” to serve the model, and then launches the application directly.

The base HTTP Flask application.

After running the application using python flask_html.py, we now have a model server application running on port 80.

* Serving Flask app "app" (lazy loading)
* Environment: production
* Debug mode: off
* Running on http://0.0.0.0:80/ (Press CTRL+C to quit)

We can also test the endpoint using Python, as shown in the snippet below:

A Python client for calling the model endpoint over HTTP.

The result of running the client code is that the requests library is used to call the model endpoint over non-secure HTTP and then print the results. We’d like to achieve the same result, but make the call happen over HTTPS. To set up this protocol, we can either lock down the Flask endpoint, provide a secure layer on top of Flask using a web service, or use a Load Balancer to provide HTTPS functionality while still using HTTP within a virtual private cloud (VPC).

Using Flask Directly
The first approach for enabling secure HTTP with a Flask application is by setting up a secure connection within Flask. This is not the recommended approach, because it’s not a best practice to use Flask directly in a production environment, and instead a WSGI server such as Gunicorn should be used to wrap the Flask application. But if you want to lock down a service during development, then this approach may be suitable. On the Flask side, the change from HTTP to HTTPS only requires changing one line, which is the last line in the prior Flask snippet.

app.run(host='0.0.0.0', port=443, ssl_context='adhoc')

We’ve modified the application to run on port 443, which is the default used for HTTPS, instead of port 80, which is the default for HTTP. Additionally, we’ve set a SSL context, which tells Flask to use SSL to provide a secure connection. The adhoc parameter tells Flask to generate unsigned credentials on the fly, rather than passing signed or unsigned credentials to the server. When you run the updated example, you’ll see that both the port and protocol of the service have changed, as shown below.

* Serving Flask app "app" (lazy loading)
* Environment: production
* Debug mode: off
* Running on https://0.0.0.0:443/ (Press CTRL+C to quit)

The service is now using SSL to provide a secure connection between the client and the server, but using the adhoc setting means that the origin of the certificate has not been validated by a trusted authority. In order to create a properly signed certificate, we’d need to host the model at a named URL, such as https://cat-classifier.ml.com rather than just an IP address, such as https://10.0.0.1. If you’re able to set up a DNS entry to map your IP address to a named URL, then you can use a certificate authority to create proper credentials. In the mean time, we’ll continue using the adhoc approach, which means that you’ll get the following warning when trying to view the model endpoint using Google Chrome.

Using adhoc credentials results in an unsigned certificate.

Also, if you modify the prior client example to use HTTPS in place of HTTP, you’ll get the following warning.

requests.exceptions.SSLError: HTTPSConnectionPool(host='localhost', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076)')))

We now have the model running as a secured endpoint, but not a trusted endpoint. To create a trusted endpoint, we need a signed certificate to validate that the traffic being sent to the endpoint actually corresponds to the model endpoint and cannot be intercepted by a third-parties posing as our service. We won’t cover this aspect in this post, but we will show how to set up a self-signed certificate in the next section.

Using a WSGI Server
Instead of running a Flask application directly, it’s better to use scalable web server frameworks, such as Gunicorn or Nginx when deploying model serving applications to production. To modify our application to use Gunicorn, we first need to create a self-signed certificate that can be used to establish a secure connection. The commands shown below show how to serve the Flask application over HTTPS using Gunicorn.

openssl req -x509 -newkey rsa:4096 -nodes -out cert.pem 
            -keyout key.pem -days 365 gunicorn --certfile cert.pem --keyfile key.pem 
         -b 0.0.0.0:443 flask_http:app

The result is similar to the past section, where we now have a secured by not trusted model endpoint. The key difference between these two examples, is that the Gunicorn approach can handle a much larger volume of traffic.

Using a Load Balancer
There’s also a third approach that can be used for setting up secure Flask applications, which is building a flask application that uses HTTP to service requests but to wrap this service within a private cluster. This approach is secure, but does not use end-to-end encryption. Once traffic has been routed to the private cluster, non-secured connections are used between machines within the cloud, which often do not have public IP addresses. One way of achieving this setup is by using Google Kubernetes Engine (GKE) to host your containerized Flask application, and by using a node port and an ingress to set up a HTTPS secured load balancer. The result is that traffic from the client to the server will be encrypted up to the ingress endpoint, and from this endpoint to the container will use HTTP internally. Again, this approach results in a secure but not trusted approach. However, if you do have a DNS entry for your endpoint, then it’s straightforward to create a proper certificate using Google-managed SSL certificates.

This section is specific to GCP, but the approach should be applicable to other Kubernetes environments. Here’s the general approach:

Containerize your application using Docker
Host the container in a Kubernetes cluster with no public IPs
Expose the service within the VPC using a node port service type
Enable outside connections using a service ingress with HTTPS enabled

There’s a number of steps that need to be followed in order to set up a GCP account, create credentials, and enable the Container Registry and GKE services needed for this approach. Addition details are provided in my prior post on setting up model services within GCP. The main difference I’ll cover in this post is using the node port plus ingress setup (Layer 7) versus directly exposing a service using a load balancer as a TCP endpoint (Layer 4).

DevOps for Data Science with GCP

Deploying Production-Grade Containers for Model Serving

towardsdatascience.com

The first step is setting up the endpoint as a Docker application. To achieve this result, we need to author a docker file that sets up the Python ecosystem, installs the required libraries, and defines the application to run. We’ll use Gunicorn to wrap the Flask application and host this endpoint over HTTP, as shown in the following Dockerfile:

FROM ubuntu:latest
MAINTAINER Ben WeberRUN apt-get update \
  && apt-get install -y python3-pip python3-dev \
  && cd /usr/local/bin \
  && ln -s /usr/bin/python3 pythonRUN pip3 install flask
RUN pip3 install pandas
RUN pip3 install gunicorn
RUN pip3 install scikit-learnCOPY app.py app.pyENTRYPOINT ["gunicorn", "--bind", "0.0.0.0:80", "app:app"]

Next, we’ll build the container, and then test the container locally before pushing the container to a registry.

sudo docker image build -t "model_service" .
sudo docker images
sudo docker run -it -p 80:80 model_service

To push the container to Google Container Registry, we first need to perform docker login before pushing the container. For details on setting up these credentials, please see my prior post on model services with GKE. After performing these steps, we’ll have a container available on GCP that we can use to deploy a service within GKE.

Pushing the Image to Google Container Registry.

After pushing the container to GCP, we’ll spin up a GKE cluster and deploy the container as a workload. Once you have a set of pods up and running the service, you can expose the service securely by first setting up a node port service type, as shown in the image below.

Exposing the model service within the VPC.

The next step is to create an ingress for the service that exposes the endpoint to the open web, instead of just within your VPC. To set up an ingress, browse to the Services and Ingress tab and select the node port you just created and select “Create Ingress”.

From this step, we can set up a load balancer with both HTTP and HTTPS endpoints. You can set up an HTTPS-only endpoint if you want to disable non-secure traffic to your model service. If you select the HTTPS option, then you’ll need to specify a certificate to use. You can fake this step for now by selecting to use a Google managed certificate and choosing any domain. The result will be an invalid certificate, but you’ll be able to test out setting up an HTTPS enabled load balancer.

Setting up an HTTPS endpoint using GCP Load Balancing.

The result will be an HTTPS endpoint that you can now view in the “Services & Ingress” tab. The model service is now running at https://34.107.189.4, but the endpoint is not a trusted URL.

We’ve now explored three options for setting up a Flask app as a secure endpoint. While the approach to use will vary based on your cloud platform and organizational preferences, the second approach of using a secured web service to wrap a Flask application is likely the most suitable approach.

Access Control for Flask

We’ve covered the first aspect of locking down a model service, which is making sure that traffic sent between clients and a server is encrypted. The next step is to use access control tools to restrict who has access to the tool. Encryption makes sure that third-parties cannot listen to traffic between hosts, while access control ensures that non-authorized parties do no have access to the service. If you’re working within a corporate network, then one form of access control that may already be in place is access limited to a VPN. We won’t cover this approach in this post, because VPN setups can be quite complex and vary significantly across organizations. Instead, we’ll explore access control through tokens, OAuth, and whitelisting.

Another approach for access control that is not covered in this post is using a web service layer, such as Nginx for user authorization prior to forwarding traffic to the Flask application. This approach works well for data science teams, because it enables a separation of concerns where a data science team sets up model serving containers and a DevOps teams manages access to endpoints. This is a good approach for mature organizations, but for data science teams that are responsible for end-to-end model deployment, it may not be feasible.

Token-Based Authentication
The easiest way to get up and running with a locked down endpoint is to have a common secret that clients can use to authenticate with the model service. In the simplest case, this means having a single shared password that clients can use to establish access to a resource. This approach, and most approaches for access control, only work if a secure connection is being used to communicate with the model service. In general, you can’t have valid access control without a secure communication protocol. The next step is to have separate access tokens or passwords for separate users, and potentially different roles with different access levels.

We’ll use the token approach to get started, because password management is a vast topic that I can’t do justice to in a short post. In fact, I’d advise against any direct password management and instead really on protocols such as OAuth for authenticated users, which is covered in the next section. The example we’ll walk through uses a single token for access, but can be extended to work with a collection of tokens. Ideally, you should have a tool for generating tokens, storing metadata about the user and role, and have the ability to reject or expire tokens. For this simple example, we’ll use Flask-HTTPAuth, which provides both token and digest authentication.

Adding token authentication to a Flask application only requires a few steps. We need to identify which routes are secure, using the @auth.login_required annotation and implement a verify_token function to authenticate clients. In this case, we check for a known token from users (1234567890abcdefg). For a production system, it’s common to store tokens in a database and to map tokens to different access policies.

Securing the Flask application with a fixed token.

Now that we’ve locked down the endpoint, when clients attempt to access the endpoint without the token they are denied access, as shown below.

Trying to access the model service without a token.

To access the model service, we need to update the client request example to now provide the token in the request header. The code snippet below shows how to update the request to add the token to the headers parameter in the request. The result should now be the same as the first example, but we’ve now locked down the model service to clients with the token.

Calling the model endpoint using the token for access.

The token approach is quite useful when working with third-parties, because you can allow access to a wide set of users. However, this approach is less secure than using protocols such as OAuth that can be used to restrict access to a named set of users.

OAuth Authentication
Tokens are useful for model endpoints, because you might need to serve predictions without requiring clients to login through a web UI. But if your target application is an interactive application instead of model predictions, then tools such as Dash can be really useful. I like Dash for building web applications, because I can code web applications in Python and use security features such as OAuth to authenticate users. With OAuth, you delegate authentication to known providers, such as Google to establish the identity of users. You still define the list of users that have access to your application, but you rely of trusted providers for the complex work of ensuring identities.

The example below shows how to use Google Oauth 2.0 to lock down a Dash application. Dash is a framework built on top of Flask and the Flask Dance library used in this example can apply to all Flask apps. To get this example to work, you’ll need to configure an OAuth rule in GCP.

A sample Dash application secured with OAuth 2.0.

The result of this approach is that you’ll get a login page when you try to visit the model serving endpoint. If you provide credentials and are in the allowed list of users, you’ll be served the following content:

The application served to authorized users.

The main benefit of using OAuth is that third parties are responsible for establishing the identity of a user, and it’s a standard protocol. If you’re setting up a web service that is GUI based, this is a good approach to use.

Whitelisting
A third way of enforcing access control is by limiting which machines have access to the service. This approach is called whitelisting, because only machines on the list of authorized IP addresses will be authorized to use the service. This approach does not validate the identify of the client making the call, but does lock down access to a small number of machines. This approach is useful for a VPN setup, where the connection to the VPN is secure and traffic is funneled through the VPN with a static address. This is also useful for model endpoints that need to interface with third-party services that have known IP addresses.

If you used the GKE service ingress approach to set up a load balanced HTTPS endpoint, then you can use GCP Cloud Armor to set up a whitelisting rule. All incoming traffic not allowed to access the service will be returned a 403 response code (Forbidden). To set up whitelisting for the load balancer, browse to Cloud Armor in the GCP console and click on “Create Policy”. You can configure a set ofblocked IPs and allowed IPs. In the example below, all traffic by default is blocked, and a second rule with a higher priority allows access from the IP 1.2.3.4. After setting up the IP rules, we can use the Targets section to apply this new policy to the model service load balancer.

Whitelisting a single IP (1.2.3.4) with GCP Cloud Armor

The result of this approach is that traffic from the specified IP addresses will be allowed and all other traffic will be served a Forbidden response. This approach does not directly authenticate users, but instead relies on traffic coming from known, static IP addresses. It’s similar to setting up inbound traffic rules when configuring EC2 security groups on AWS.

Conclusion

If you plan on setting up machine learning models that will be served over the web, then you should plan for security and access control from the start of the project. While it’s common to provide security measures using tools that wrap the functionality of services set up by data science teams, any team that publishes services should make sure that web services are both secure and restricted to authorized users.

Ben Weber is a distinguished data scientist at Zynga. We are hiring!

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.