Serverless Functions and Using AWS Lambda with S3 Buckets

Learn how to build a Lambda function triggered by an S3 PUT event to sort files in an S3 bucket.

Published in

Towards Data Science

10 min readJan 4, 2021

In my previous articles you may have seen me going on and on about deploying code on server instances on the cloud, building services to manage those instances, building a reverse-proxy on top of those services and so on. No doubt that some of you may have wished if you could just write code, deploy it somewhere and not bother about the excessive complexities of setting up and managing server instances. Well, depending on your use case, there might be a solution — Serverless Functions.

Serverless functions allow code to be deployed without you allocating any infrastructure for the code to be hosted on. AWS Lambda is an FaaS (Function as a Service) platform that allows you to build serverless functions. AWS Lambda supports most major programming languages like Go, Java, Ruby, Python2 and Python3. For this tutorial, we will be using Python3. Even though they are called “serverless”, they actually run inside a variety of runtime environments on cloud server instances.

Serverless functions are stateless, i.e. one execution of a function does not maintain a state that subsequent executions can recognise or use. In other words one execution of a Serverless function does not in any way communicate with another execution. Since serverless functions are time- and resource-limited, they are suitable for short-lived tasks. They provide very little flexibility in terms of allocation of memory, CPU, storage etc. One implication of adopting a certain FaaS platform for going Serverless is that you are stuck with the platform’s vendor for most other cloud services that your serverless functions may interact with.

A little bit about microservices…

Serverless functions can be used to build a microservices architecture, where your software is built up of smaller, independent microservices that provide specific functionalities. Microservices make it easier for developers to build, test and manage software in an agile way. Microservices are completely separate pieces of your software, so different microservices can be coded, tested and deployed in parallel. It is much easier to pin-point and fix errors in a microservices architecture as you only have to work with the microservice that is malfunctioning.

Netflix, for instance, became one of the earliest adopters of a microservices-based architecture when they began moving their software onto AWS cloud in 2009. They currently maintain an API gateway that receives billions of requests daily and is built up of separate microservices for processes like user sign-up, downloading movies etc. By switching to a microservices architecture, Netflix was able to speed up development and testing of its software and easily rollback if errors were encountered.

Lambda with other AWS services

Serverless functions on AWS Lambda or simply Lambda functions can do some really cool things when used in combination with other AWS services, like using Amazon Alexa to turn EC2 instances on and off or lighting bulbs when something’s pushed onto your CodeCommit (or even GitHub) repository.

There are 3 ways you can use Lambda in combination with other AWS services:

Use other AWS services as triggers to invoke lambda functions*. A lambda function can have multiple triggers using a wide range of AWS services, like an S3 bucket PUT or DELETE event, a call to an API Gateway endpoint etc.
Interact with AWS services from inside the lambda function, like adding or deleting data in a DynamoDB database or getting the state of an EC2 instance. You can do this using a variety of SDKs (Software Development Kits) built by AWS, for Java, Python**, Ruby and more in your lambda function’s code. A long list of AWS services can be controlled or communicated with, using these SDKs.
Use AWS services as destinations for invocation records of your Lambda function whenever it is invoked. As of today, only 4 AWS services can be used this way: SNS, SQS, EventBridge or Lambda itself.

* When a Lambda function is invoked by another AWS service, Lambda passes specific information from that AWS service to the function using an event object. This will include information like what item in which DynamoDB database triggered the Lambda function.

** Boto3 is a python library (or SDK) built by AWS that allows you to interact with AWS services such as EC2, ECS, S3, DynamoDB etc. In this tutorial we will be using Boto3 to manage files inside an AWS S3 bucket. Full documentation for Boto3 can be found here.

Using Lambda with AWS S3 Buckets

Pre-requisites for this tutorial: An AWS free-tier account.

An S3 bucket is simply a storage space in AWS cloud for any kind of data (Eg., videos, code, AWS templates etc.). Every directory and file inside an S3 bucket can be uniquely identified using a key which is simply it’s path relative to the root directory (which is the bucket itself). For example, “car.jpg” or “images/car.jpg”.

Besides being a powerful resource for developing microservices-based software, Lambda functions make highly effective DevOps tools. Let’s look at an example of using Lambda functions with S3 buckets in the first two ways mentioned above to solve a simple DevOps problem :)

Say you are receiving XML data from three different gas meters straight into an AWS S3 bucket. You want to sort the XML files into three separate folders based on which gas meter the data comes from. The only way to know the data source is to look inside the XML files, which look like this:

<data>    
   <data-source>gas_meter3</data-source>    
   <data-content>bla bla bla</data-content>
</data>

How would you automate this process? This is where AWS lambda could prove handy. Let’s look at how to do this.

1 - Creating an S3 bucket

Let’s start by building an empty S3 bucket. All you have to do is to go to the S3 page from your AWS console and click on the “Create bucket” button. Make sure you leave the “Block all public access” checkbox ticked and click on “Create bucket”.

Now, add a directory called “unsorted” where all the XML files will be stored initially. Create a .xml file named “testdata.xml” with the following content:

<data>    
   <data-source>gas_meter3</data-source>    
   <data-content>bla bla bla</data-content>
</data>

2 - Creating a Lambda function

From the Services tab on the AWS console, click on “Lambda”. From the left pane on the Lambda page, select “Functions” and then “Create Functions”.

Select “Author from scratch” and give the function a suitable name. Since I’ll be using Python3, I chose “Python3.8” as the runtime language. There are other versions of Python2 and Python3 available as well. Select a runtime language and click on the “Create function” button. From the list of Lambda functions on the “Functions” page, select the function you just created and you will be taken to the function’s page.

Lambda automatically creates an IAM role for you to use with the Lambda function. The IAM role can be found under the “Permissions” tab on the function’s page. You need to ensure that the function’s IAM role has permission to access and/or manage the AWS services you connect to from inside your function.

Make sure you add “S3” permissions to the IAM role’s list of permissions, accessible via the IAM console.

3 - Adding a trigger for our Lambda function

We want the Lambda function to be invoked every time an XML file is uploaded to the “unsorted” folder. To do this, we will use an S3 bucket PUT event as a trigger for our function.

Under the “Designer” section on our Lambda function’s page, click on the “Add trigger” button.

Select the “S3” trigger and the bucket you just created. Select “PUT” event type. Set the prefix and suffix as “unsorted/” and “.xml” respectively. Finally, click on “Add”.

4 - Adding code to our Lambda function

There are 3 ways you can add code to your Lambda function:

Through the code editor available on the console.
By uploading a .zip file containing all your code and dependencies.
By uploading code from an S3 bucket.

We will use the first method for this tutorial. On your function page, go down to the “Function code” section to find the code editor.

Copy and paste the following code into the code editor:

import json
import boto3
import uuid
from urllib.parse import unquote_plus
import xml.etree.ElementTree as ETdef lambda_handler(event, context):
    s3 = boto3.resource('s3', region_name='')#Replace with your region name    #Loops through every file uploaded
    for record in event['Records']:
        bucket_name = record['s3']['bucket']['name']
        bucket = s3.Bucket(bucket_name)
        key = unquote_plus(record['s3']['object']['key'])        # Temporarily download the xml file for processing
        tmpkey = key.replace('/', '')
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), tmpkey)
        bucket.download_file( key, download_path)        machine_id = get_machine_id_from_file(download_path)        bucket.upload_file(download_path, machine_id+'/'+key[9:])
        
        s3.Object(bucket_name,key).delete()def get_machine_id_from_file(path):
    tree = ET.parse(path)
    root = tree.getroot()    return root[0].text

Don’t forget to replace the region name.

Make sure the handler value is “<filename>.lambda_handler” . The handler value specifies which function contains the main code that Lambda executes.

Whenever Lambda runs your function, it passes a context object and an event object to it. This object can be used to get information about the function itself and its invocation, eg., function name, memory limit, log group id etc. The context object can be very useful for logging, monitoring and data analytics usages.

As mentioned earlier the event object is used by Lambda to provide specific information to the Lamda function from the AWS service that invoked the function. The information, which originally comes in JSON format, is converted to an object before being passed into the function. In the case of Python, this object is typically a dictionary. In the code above, you can see that the event object has been used to get the name of the S3 bucket and the key of the object inside the S3 bucket that triggered our function.

The code above is simple to understand. It does the following:

Get info from event object.
Download the XML file that caused the Lambda function to be invoked.
Process the XML file to find the machine_id from the first line of the XML file.
Upload the file back to the S3 bucket, but inside a folder named the value of machine_id.
Delete the original file.

Now press the “Deploy” button and our function should be ready to run.

5 - Testing our Lambda function

AWS has made it pretty easy to test Lambda functions via the Lambda console. No matter what trigger your Lambda function uses, you can simulate the invocation of your Lambda function using the Test feature on the Lambda console. All this takes is defining what event object will be passed into the function. To help you do this, Lambda provides JSON templates specific to each type of trigger. For example, the template for an S3 PUT event looks like this:

{
  "Records": [
    {
      "eventVersion": "2.0",
      "eventSource": "aws:s3",
      "awsRegion": "us-west-2",
      "eventTime": "1970-01-01T00:00:00.000Z",
      "eventName": "ObjectCreated:Put",
      "userIdentity": {
        "principalId": "EXAMPLE"
      },
      "requestParameters": {
        "sourceIPAddress": "126.0.0.1"
      },
      "responseElements": {
        "x-amz-request-id": "EXAMPLE123456789",
        "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH"
      },
      "s3": {
        "s3SchemaVersion": "1.0",
        "configurationId": "testConfigRule",
        "bucket": {
          "name": "example-bucket",
          "ownerIdentity": {
            "principalId": "EXAMPLE"
          },
          "arn": "arn:aws:s3:::example-bucket"
        },
        "object": {
          "key": "test/key",
          "size": 1024,
          "eTag": "0123456789abcdef0123456789abcdef",
          "sequencer": "0A1B2C3D4E5F678901"
        }
      }
    }
  ]
}

To test the Lambda function you just created, you need to configure a test event for your function. To do this, click on the “Select a test event” dropdown right above the Lambda code editor and click on “Configure test event”.

From the pop up menu, make sure the “Create new test event” radio button is selected and select the “Amazon S3 Put” event template. You should be provided with JSON data similar to that in the code snippet above. All we are concerned with is the data that is used in our Lambda function, which is the bucket name and the object key. Edit those two values appropriate to the S3 bucket and the XML file you created earlier. Finally give the test event a name and click on “Create”.

Now that you have a test event for your Lambda function, all you have to do is click on the“Test” button on top of the code editor. The console will tell you if the function code was executed without any errors. To check if everything worked go to your S3 bucket to see if the XML file has been moved to a newly created “gas-meter3/” directory.

As you may have noticed, one downside of testing via the console is that the Lambda function actually communicates with other AWS services. This may cause unintentional changes to your AWS resources or even loss of valuable work. The solution to this is to build and run your lambda functions locally on your machine. Testing Lambda functions locally is not as straightforward. You will need to use tools like SAM CLI and Localstack to do this.

6 - All done!

Now your Lambda function should sort any XML files uploaded to the “unsorted” folder on your S3 bucket into separate folders, provided that the XML data is in the format specified in this tutorial.

I hope that this article gave you a taste of what is achievable using AWS Lambda and some insight into Serverless Functions in general. Now it’s your time to get creative with AWS Lambda.

Thank you for reading!

Serverless Functions and Using AWS Lambda with S3 Buckets

Learn how to build a Lambda function triggered by an S3 PUT event to sort files in an S3 bucket.

A little bit about microservices…

Lambda with other AWS services

Using Lambda with AWS S3 Buckets

Written by Abdul Rahman