The world’s leading publication for data science, AI, and ML professionals.

Serverless comes to machine learning with container image support in AWS Lambda.

Today AWS Lambda released an astonishing new feature that could ease things a lot for machine learning practitioners.

IMPROVE MACHINE LEARNING DEPLOYMENT WITH NEW RE:INVENT 2020 AWS LAMBDA FEATURE

Photo by frank mckenna on Unsplash
Photo by frank mckenna on Unsplash

AWS Lambda was released back in 2014, becoming a game-changing technology. By adopting Lambda, many developers have found a new way to build micro-services that could be easily achieved. It comes with many additional advantages such as event-based programming, cloud-native deployment, and the development of the now well-known infrastructure-as-code paradigm.

A paradigm-shifting technology like AWS Lambda had to define its own standards to support all the modern app development lifecycle requirements. To make things easy to develop, Lambda decided to offer the easiest way of code project management: the zip file format.

Then, packaging and deploying a code function to AWS Lambda has been defined as simple as building a zip file with all the dependencies packed within and uploading it to

In the meantime, for many developers, container images gained traction as a preferred way of packaging applications. They became a viable solution to support a variety of use cases, thus making a lot of developers out there feel comfortable with some different tools that were not supported by AWS Lambda, such as docker OCI file format, adopted by container DevOps as a standard docker image packaging format. Sometime later, Containers are largely used for packaging application requirements into a single image. In a serverless world, this could be achieved by bundling into the zip file every project dependency, but it was error-prone and could hit the dreaded 50MB Lambda package upload size limit.

Container Image Support in AWS Lambda is a game-changing update for serverless machine learning practitioners

At re:Invent 2020, AWS announced a long-awaited update for AWS Lambda by many developers and data scientists because it could change the way we build functions. It comes with bonus features that make this release something very welcome in the serverless world: starting from today it is possible to package a lambda function starting from an OCI file format.

The best part is that a custom Dockerfile could either extend a lambda base image, provided by AWS for any supported runtime and published on DockerHub, or start from a fresh Alpine or Debian image, thus customizing Linux dependencies, packages, and everything we usually do with a dockerized app container.

This has some serious implications for people willing to use AWS Lambda to serve the machine learning model: i.e., a custom Pytorch classification model that uses common ML libraries such as pandas, scikit-learn, and PyTorch.

Moreover a new announced feature of container image support for AWS Lambda is the image size limit of 10GB. This means a lot to us: all the libraries required by a Machine Learning stack and even the weights of the model can now be packaged and published to a docker registry!

To ease things out for developers and data scientists, we can also run pip install in the very same environment of the production Lambda with a multi-stage docker build. This allows us to specify all the packages we need. In order to offer to our code a common and standard entry point, the image must use the just-released AWS Lambda Runtime Interface Client to launch our code handler:

# Partial Dockerfile
[...]
FROM node:${RUNTIME_VERSION}-alpine${DISTRO_VERSION}
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
ENTRYPOINT [ "/usr/local/bin/npx", "aws-lambda-runtime-client" ] 
CMD [ "app.handler" ]

where build-image is the image built (in a previous build stage) with all the dependencies required to our machine learning model, such as pandas, scikit-learn or pytorch, in a practical _FUNCTIONDIR.

Then, our image is built using the standard docker command, and pushed to an existing registry container configured on Amazon ECR:

docker build -t <image name>
docker tag <image name>:latest < account-ID >.dkr.ecr.<region>.amazonaws.com/<image_name>:latest
docker push < account-ID >.dkr.ecr.<region>.amazonaws.com/<image_name>:latest

Then, whenever we can create a new Lambda function, just setting a code pointer to the Amazon ECR image and package type equal to Image.

aws lambda create-function 
     --function-name <function name> 
     --package-type Image 
     --code ImageUri=<ECR Image URI> 
     --role <function execution role>

A practical example using container image support in AWS Lambda

In a previous article, I’ve described how we managed to migrate our Neosperience Image Memorability scoring service to AWS Lambda using a shared Elastic File System volume to store our ML model, thus reducing our costs of an order of magnitude, while paying just a few seconds more in computation time (highly compensated by AWS Lambda parallel execution).

One of the most difficult aspects we had to struggle with during development was the need to package all the dependencies of our python code:

# code dependency list
absl-py,astroid,astunparse,boto3,botocore,cachetools,certifi, 
chardet,cycler,decorator,docutils,gast,grpcio,h5py,idna,imageio, importlib-metadata,isort,jmespath,Keras,Keras-Preprocessing, kiwisolver,lazy-object-proxy,mccabe,networkx,numpy,opt-einsum, pandas,Pillow,protobuf,pyasn1,pyasn1-modules,pyparsing,python-dateutil,pytz,PyWavelets,PyYAML,requests,requests-oauthlib,
rsa,scikit-image,scipy,six,tensorflow,tensorflow-estimator,termcolor,tifffile,toml,typed-ast,urllib3,utils,vis,watchtower,Werkzeug,wrapt,zipp

The final environment contained many libraries, with an overall size of more than a hundred megabytes, even after a manual tree shaking phase. It required us to package the libraries away from our AWS Lambda code and deploy as a Lambda Layer, paying attention to stay below the maximum unpackaged function size of 250 megabytes.

Leveraging the capability to create a function from an OCI image with all the dependencies makes this a breeze:

# Define global args
ARG FUNCTION_DIR="/home/app/" 
# Stage 1 - bundle base image + runtime
FROM python:3.6.12-alpine AS python-alpine 
RUN apk add --no-cache 
    libstdc++
# Stage 2 - build function and dependencies 
FROM python-alpine AS build-image
# Install aws-lambda-cpp build dependencies 
RUN apk add --no-cache 
    build-base 
    libtool  
    autoconf  
    automake  
    libexecinfo-dev  
    make 
    cmake  
    libcurl
# Install AWS CLI
RUN pip install awscli
# Authenticating with AWS CLI
ARG AWS_ACCESS_KEY_ID
ARG AWS_SECRET_ACCESS_KEY
ARG AWS_SESSION_TOKEN
ENV AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
ENV AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} ENV AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN}
# Include global args in this stage of the build ARG FUNCTION_DIR
ARG RUNTIME_VERSION
# Create function directory
RUN mkdir -p ${FUNCTION_DIR} 
# Copy handler function
COPY app/* ${FUNCTION_DIR}
# Install the function's dependencies
RUN pip install --target ${FUNCTION_DIR} awslambdaric
# Stage 3 - final runtime image
# Grab a fresh copy of the Python image 
FROM python-alpine
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies and install project specific dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
RUN pip install -r requirements.txt
ENTRYPOINT ["/usr/local/bin/python","-m","awslambdaric"] 
CMD [ "app.handler" ]

Once pushed to Amazon ECR, it can be used to create a new function with our code onboard, packaged together with machine learning dependencies.

Conclusion

The newly released feature of Container Image Support in AWS Lambda enables developers and data scientists to leverage the flexibility and familiarity of tools they already have and empowers them to use well-established development lifecycles. The maximum image size, set to 10GB, allows building images to enable virtually any inference workflow. One more step towards full Serverless machine learning support in AWS Lambda.


Related Articles