How to Build ML Applications on the AWS Cloud with Kubernetes and oneAPI

Learn the basics of Kubernetes and Intel AI Analytics Toolkit for building distributed ML Apps

Published in

Towards Data Science

12 min readMar 17, 2023

Building and deploying high-performance AI applications can be a challenging task that requires a significant amount of computing resources and expertise. Fortunately, modern technologies such as Kubernetes, Docker, and the Intel AI Analytics Toolkit (AI Kit) make it easier to develop and deploy AI applications optimized for performance and scalability. Moreover, by using cloud services like Amazon Web Services (AWS), developers can further streamline the process and take advantage of the flexible and scalable infrastructure provided by the cloud.

In this article, we will explore how to use Kubernetes, Docker, and the Intel AI Analytics Toolkit to build and deploy AI applications on the AWS cloud. Specifically, we will focus on one of the first Intel Cloud Optimization Modules, which serves as a template with codified Intel accelerations covering various AI workloads. We will also introduce the AWS services that we will use in the process, including Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Registry (ECR), Amazon Elastic Compute Cloud (EC2), and Elastic Load Balancer (ELB).

Figure 1. This architecture is designed for AI production scenarios where many discrete models must be trained with low-moderate compute requirements. — Image by Author

The sample application that we will deploy focuses on Loan Default prediction, a common problem in the finance industry. We will use the daal4Py library to accelerate the inference of an XGBoost Classifier, enabling us to achieve high performance while reducing the time required to train and deploy the model.

Figure 2. A simplified API to Intel(R) oneAPI Data Analytics Library that allows for fast usage of the framework suited for Data Scientists or Machine Learning users. Built to help provide an abstraction to Intel(R) oneAPI Data Analytics Library for either direct usage or integration into one’s framework. — Image Source

By the end of this article, readers will have a basic understanding of how to build and deploy performant AI applications on the AWS cloud using Kubernetes, Docker, and the Intel AI Analytics Toolkit. Additionally, they will have a practical example of how to leverage these technologies to accelerate the inference of a loan default prediction model.

You can find all of the source code for this tutorial in our public GitHub Repository.

Get your Development Environment Ready

Install the AWS CLI — The AWS CLI (Command Line Interface) tool is a command-line tool for managing various Amazon Web Services (AWS) resources and services.

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
sudo apt install unzip
unzip awscliv2.zip
sudo ./aws/install

Configure AWS Credentials using aws configure — learn more about setting credentials with aws cli here.

Install eksctl — eksctl is a command-line tool for creating, managing, and operating Kubernetes clusters on EKS.

curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
eksctl version

Install aws-iam-configurator — AWS IAM Authenticator is a command-line tool that enables users to authenticate with their Kubernetes clusters on EKS using their AWS IAM credentials.

curl -Lo aws-iam-authenticator https://github.com/kubernetes-sigs/aws-iam-authenticator/releases/download/v0.5.9/aws-iam-authenticator_0.5.9_linux_amd64
chmod +x ./aws-iam-authenticator
mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
aws-iam-authenticator help

Install kubectl — Kubectl is a command-line tool for interacting with Kubernetes clusters. It allows users to deploy, inspect, and manage applications and services running on a Kubernetes cluster and perform various administrative tasks such as scaling, updating, and deleting resources.

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
curl -LO "https://dl.k8s.io/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
echo "$(cat kubectl.sha256)  kubectl" | sha256sum --check
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

Our Loan Default Prediction Application

The application we will be deploying is based on the Loan Default Risk Prediction AI Reference Kit.

We refactored the code from this reference solution to be more modular in support of our three main APIs:

Data processing — This endpoint preprocess data and stores it in a data lake or another structured format. This codebase also handles the expansion of the dataset for benchmarking purposes.
Model Training — This endpoint trains an XGBoost Classifier and converts it to an inference-optimized daal4py format.
Inference — This endpoint receives a payload with raw data and returns the loan default classification of each sample.

The directory tree below outlines the codebase’s various scripts, assets, and configuration files. The majority of the ML application code is in the app/ folder. This folder contains loan_default and utils packages — the loan_default package contains the server-side python modules that support our three main APIs. The server.py script contains the FastAPI endpoint configurations, payload data models, and commands to start a uvicorn server.

├───app/
|   ├───loan_default/
|   |   ├───__init__.py
|   |   ├───data.py
|   |   ├───model.py
|   |   └───predict.py
|   ├───utils/
|   |   ├───__init__.py
|   |   ├───base_model.py
|   |   ├───logger.py
|   |   └───storage.py  
|   ├───logs/
|   ├───server.py
|   └───requirements.txt    
|
├───kubernetes/
|   ├───cluster.yaml
|   ├───deployment.yaml
|   ├───service.yaml
|   └───serviceaccount.yaml
|
├─README.md
├─Dockerfile
├─SECURITY.md

A deep dive into the code base is beyond the scope of this tutorial. However, it is worth pointing out where we leverage the daal4py to improve our inference performance. Inside model.py, you’ll find the “train” method, which handles model training and conversion to daal4py format using the d4p.get_gbt_model_from_xgboost() function.

def train(self):
        # define model
        params = {
            "objective": "binary:logistic",
            "eval_metric": "logloss",
            "nthread": 4,  # flags.num_cpu
            "tree_method": "hist",
            "learning_rate": 0.02,
            "max_depth": 10,
            "min_child_weight": 6,
            "n_jobs": 4,  # flags.num_cpu,
            "verbosity": 0,
            "silent": 1,
        }

        log.info("Training XGBoost model")
        self.clf = xgb.train(params, self.DMatrix, num_boost_round=500)
        self.clf = d4p.get_gbt_model_from_xgboost(self.clf)

In the original reference kit’s performance testing, this simple conversion resulted in an ~4.44x boost in performance (Figure 3).

Figure 3. For batch inference of size 1M, Intel® v1.4.2 offers up to a 1.34x speedup over stock XGBoost v0.81 and with Intel® oneDAL, up to a 4.44x speedup. — Image by Author

Configuring and Launching Elastic Kubernetes Service Clusters

Elastic Kubernetes Service is a fully managed service that makes it easy to deploy, manage, and scale containerized applications using Kubernetes on Amazon Web Services (AWS). It eliminates the need to install, operate, and scale Kubernetes clusters on your own infrastructure.

To launch our EKS cluster, we must first create our cluster configuration file.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: "eks-cluster-loanDefault"
  version: "1.23"
  region: "us-east-1"

managedNodeGroups:
- name: "eks-cluster-loanDefault-mng"
  desiredCapacity: 3
  instanceType: "m6i.large"

We can configure the name and region of our cluster deployment, as well as the version of EKS that we want to run, in our “metadata” section. Most importantly, we can configure basic requirements for we compute resources in the “managedNodeGroups” section:

desiredCapacity — the number of nodes to scale to when your stack is created. In this tutorial, we will set this to 3.
instanceType — the instance type for your nodes. This tutorial uses an m6i.large instance, a 3rd Generation Xeon (2vCPU and 8GiB). Once openly available, we recommend trying out the r7iz instance family to take advantage of the Intel Advanced Matrix Extension (AMX) — a dedicated accelerator for deep learning workloads inside of Intel 4th Generation Xeon CPUs.

We execute eksctl create cluster -f cluster.yaml to create the Cloud Formation stack and provision all relevant resources. With the current configurations, this process should take 10 to 15 minutes. You should see a log similar to Figure 4.

Figure 4. Cloud formation log for EKS cluster provision workflow — Image by Author

You should run a quick test to ensure your cluster has been provisioned properly. Run eksctl get cluster to get the name of your available cluster(s), and eksctl get nodegroup --cluster <cluster name> to check on your cluster’s node group.

Setting up all of the Kubernetes Application Resources

Let’s dig into launching your Kubernetes application. This process entails creating a namespace, a deployment manifest, and a Kubernetes service. All of these files are available in the tutorial’s codebase.

Before moving on to this part of the tutorial, please:

A Kubernetes namespace is a virtual cluster that divides and isolates resources within a physical cluster. Let’s create a namespace called “loan-default-app”

kubectl create namespace loan-default-app

Now, let’s configure our Kubernetes deployment manifest. A Kubernetes deployment is a Kubernetes resource that allows you to declaratively manage a set of replica pods for a given application, ensuring that the desired number of replicas are running and available at all times while enabling features such as scaling, rolling updates, and rollbacks. It also provides an abstraction layer over the pods, allowing you to define your application’s desired state without worrying about the underlying infrastructure.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: "eks-loan-default-app"
  namespace: "loan-default-app"
  labels:
    app: "loan-default"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: "loan-default"
  template:
    metadata:
      labels:
        app: "loan-default"
    spec:
     serviceAccountName: "loan-default-service-account"
     topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: "loan-default"
     containers:
       - name: "loan-app-image"
         image: <insert image uri>
         ports:
           - containerPort: 80
         imagePullPolicy: "Always"

The Kubernetes deployment manifest (deployment.yaml) above defines the following:

kind: Deployment — The type of Kubernetes resource
name: “eks-loan-default-app” — The name of our deployment
namespace: “loan-default-app” — The namespace that this deployment should be assigned to
app: “loan-default” — The name we assign our application
replicas: 3 — the number of desired copies of a pod that should be created and maintained at all times.
serviceAccountName: “loan-default-service-account” — make sure this matches the service account you created earlier.
topologySpreadConstraints: — helps define how pods should be distributed across your cluster. The current configuration will maintain an equal distribution of pods across available nodes.
containers: name/image — where you provide the URI for your application container image and assign the image a name.

Run kubectl apply -f deployment.yaml to create your Kubernetes deployment.

Now let’s configure our Kubernetes service. A Kubernetes service is an abstraction layer that provides a stable IP address and DNS name for a set of pods running the same application, enabling clients to access the application without needing to know the specific IP addresses of individual pods. It also provides a way to load-balance traffic between multiple replicas of the application and can be used to define ingress rules for external access.

apiVersion: v1
kind: Service
metadata:
  name: "loan-default-service"
  namespace: "loan-default-app"

spec:
  ports:
  - port: 8080
    targetPort: 5000
  selector:
    app: "loan-default"
  type: "LoadBalancer"

The Kubernetes service manifest (service.yaml) above defines the following:

kind: Service — the type of Kubernetes resource.
name: “loan-default-service” — The name of our deployment.
namespace: “loan-default-app” — The namespace that this Service should be assigned to.
port: 8080 — The port where the service will listen to.
targetPort: 5000 — The port the service will communicate with on the pods.
app: “loan-default” — The name we assigned to our application
type: “LoadBalancer” — The type of service we selected.

Run kubectl apply -f service.yaml to create your Kubernetes service.

This will automatically launch an Elastic Load Balancer — a cloud service that distributes incoming network traffic across multiple targets, such as EC2 instances, containers, and IP addresses, to improve application availability and fault tolerance. We can use the ELB’s public DNS to make requests to our API endpoints from anywhere in the world.

Here are a few tips before moving on:

Run kubectl get all -n loan-default-app to get a full overview of the Kubernetes resources you have provisioned. You should see your pods, services, and replica groups.
Run kubectl -n loan-default-app describe pod <pod-id> to get a detailed description of your pod.
If you need to diagnose a specific pod’s behavior, you can start a bash shell inside your pod by running kubectl exec -it <pod-id> -n loan-default-app -- bash — type exit and hit enter to exit the shell.

Testing our Loan Default Prediction Kubernetes Application

Now that all of our infrastructure is in place, we can set up the data component of our application and test our endpoints.

We will begin by downloading the dataset from Kaggle. The dataset used for this demo is a set of 32581 simulated loans. It has 11 features, including customer and loan characteristics and one label, which is the outcome of the loan. Once we have the .csv file in our working directory, we can create an S3 bucket and upload are Kaggle dataset.

# create S3 Bucket for data
aws s3api create-bucket --bucket loan-default --region us-east-1

# upload dataset
aws s3api put-object --bucket loan-default --key data/credit_risk_dataset.csv --body <local path to data

Making HTTP Requests to our API Endpoints

We will be using Curl to make HTTP requests to our server. Curl allows you to send HTTP requests by providing a command-line interface where you can specify the URL, request method, headers, and data. It then handles the low-level details of establishing a connection, sending the request, and receiving the response, making it easy to automate HTTP interactions.

We will start by sending a request to our data processing endpoint. This will create test/train files and save our preprocessing pipeline as a .sav file to S3. The body of the requests requires the following parameters:

bucket: name of S3 bucket
key: path where your raw data is saved in S3
size: total samples you want to process
backend: options include “local” or “s3” — the codebase supports running the entire app locally for debugging purposes. When using the “s3” backend, the “local_path” and “target_path” parameters can be set to “None”.

curl -X POST <loadbalancerdns>:8080/data -H 'Content-Type: application/json' -d '{"bucket":"loan-default","backend":"s3","key":"data/credit_risk_dataset.csv","target_path":"None","local_path":"None","size":400000}'

You can navigate to your S3 bucket in the AWS console to verify that all files have been properly generated (Figure 5).

Figure 5. S3 bucket with the outputs generated by our /data endpoint — Image by Author

Now we are ready to train our XGBoost Classifier model. We will make a request to our /train endpoint, which trains our model, converts it to daal4py format, and saves it to S3. The body of the requests requires the following parameters:

bucket: name of S3 bucket
data_key: folder path that contains processed data created by our data processing API
model_key: folder where we want to store our trained model
model_name: the name that we want to give our trained model
backend: options include “local” or “s3” — the codebase supports running the entire app locally for debugging purposes. When using the “s3” backend, the “local_model_path” and “local_data_path” parameters can be set to “None.”

curl -X POST <loadbalancerdns>:8080/train -H 'Content-Type: application/json' -d '{"bucket":"loan-default","backend":"s3","local_model_path":"None","data_key":"data","model_key":"model","model_name":"model.joblib","local_data_path":"None"}'

You can navigate to your S3 bucket in the AWS console to verify that your model file has been created (Figure 6).

Figure 6. S3 bucket with the outputs generated by our /train endpoint — Image by Author

Now that we have a trained daal4py optimized XGBoost Classifier, we can make inference requests to our API. The /predict endpoint will return a binary classification of True for high default likelihood and False for low default likelihood. The response also includes the probability generated by the classifier. In the codebase, we have set anything above a 50% probability to be labeled as a high default likelihood. This can be adjusted to return more discretized labels like low, medium, and high default likelihood. The body of the requests requires the following parameters:

bucket: name of S3 bucket
model_name: the name of the trained model is S3
data_key: folder path that contains .sav processing pipeline file (should be the same as your processed data folder)
model_key: folder where your trained model was saved in S3
sample: your model inputs as a list of dictionaries
backend: options include “local” or “s3” — the codebase supports running the entire app locally for debugging purposes. When using the “s3” backend, the “local_model_path” and “preprocessor_path” parameters can be set to “None”.

curl -X POST <loadbalancerdns>:8080/predict -H 'Content-Type: application/json' -d '{"backend":"s3","model_name":"model.joblib","data_key":"data","bucket":"loan-default","model_key":"model","sample":[{"person_age":22,"person_income":59000,"person_home_ownership":"RENT","person_emp_length":123,"loan_intent":"PERSONAL","loan_grade":"D","loan_amnt":35000,"loan_int_rate":16.02,"loan_percent_income":0.59,"cb_person_default_on_file":"Y","cb_person_cred_hist_length":3},{"person_age":22,"person_income":59000,"person_home_ownership":"RENT","person_emp_length":123,"loan_intent":"PERSONAL","loan_grade":"D","loan_amnt":35000,"loan_int_rate":55.02,"loan_percent_income":0.59,"cb_person_default_on_file":"Y","cb_person_cred_hist_length":0}],"local_model_path":"None","preprocessor_path":"None"}'

You can expect a response from the server fairly quickly (Figure 7).

Figure 7. Payload and response from the /predict endpoint — Image by Author

You can find all of the source code for this tutorial in our public GitHub Repository. Feel free to leave a comment or message me on LinkedIn if you have any questions.

Summary and Discussion

In this tutorial, we have demonstrated how to build a Kubernetes application on the AWS cloud based on a high-availability solution architecture. We have highlighted the use of Intel Xeon processors and AI Kit components to improve performance while enabling scale with Kubernetes.

We encourage readers to watch for upcoming workshops and future Intel Cloud Optimization Modules (ICOMs), as leveraging the Intel optimizations in these modules can qualify their applications for an “Accelerated by Intel” badge.

Our goal with ICOMs is to help developers enhance the performance and scalability of their applications with intel software and hardware. With the increasing demand for high-performance cloud applications, it is crucial for developers to stay informed and utilize the latest technologies and tools available to them.

Don’t forget to follow my profile for more articles like this!