Making Sense of Big Data
AWS Lambda is quite convenient. It is really easy just deploy a function on the cloud, without having to worry about the setup. You can get it running in mere minutes. But, its one major limitation is the deployment size, you can not deploy anything larger than 250MB.
Your own code is not likely to be that large, but you might need libraries that are not included in the base setup, and that can easily end up being hundreds of MB or several GB. That is usually the case when you are training a model or running inference using an existing model together with specific ML packages for your model.
In such cases AWS EFS(Elastic File System) comes to the rescue. An EFS is a file system that you can mount from different instances, including a Lambda function. One major caveat is that all these resources should be in the same security group and VPC (Virtual Private Cloud).
![High Level View of the Architecture [Image created by the author]](https://towardsdatascience.com/wp-content/uploads/2021/08/1o4UFooJF4mPHC3RfFLwlqA.png)
This setup involves the following steps:
- Create a VPC and a security group, use this VPC and security group in all the following steps.
- Create an EFS
- Create an EC2 instance
- Mount EFS in the EC2 instance
- Install required packages
- Create AWS Lambda function
- Mount the same EFS file system in the Lambda function
Once the packages are installed you can spin down the EC2 instance, it will no longer be needed.
The overall architecture of this system is not complicated, but the complexity of the AWS interface and certain changes in deployments make it harder to set things up. Required libraries and CLI commands change, while guides on AWS blog become outdated. I will try to provide a detailed step-by-step instructions, up to date as of writing this article. I will be using AWS web console (https://console.aws.amazon.com). You can simply search for the relevant resource if you are not familiar with AWS console.
Creating the VPC
Unless you know what you are doing I would advice using the VPC wizard. A VPC is isolated by default, it would not have internet access. Adding internet access to a VPC involves a long list of detailed steps, creating route tables, public and private subnets, NAT and internet gateways etc. (https://stackoverflow.com/a/55267891). The VPC wizard can do all that for you. Simply go to the VPC section from your AWS console and click on Launch VPC Wizard button.

VPC with Public and Private Subnets would have internet access for your lambda functions. But this might have higher costs due to the usage of NAT gateway. If you do not require that you can go for VPC with a Single Public Subnet. But you should think long and hard before doing that, it would be difficult to change after this step.

All you need to do here is give your VPC a name and select an Elastic IP, you can leave the rest as is and create a VPC.
Create a Security Group
Now that we have a VPC we need to create a security group. For the purposes of this guide I will just create one allowing all incoming/outgoing traffic. You can of course create one more suited for your security needs, but make sure it gives you access.

Make sure you have selected the VPC we have created. You can name it anyway you like.

I have simply set all incoming and outgoing traffic to be allowed for the purposes of this guide. Ideally you would only grant access to your own IP address, and on the ports that you need (i.e. SSH).
Now that we have our VPC and security group, we can create our EFS, EC2 instance and Lambda function.
Create an EFS
You can automatically create an EFS during the EC2 instance creation, but you will still have to make changes to it in order to access from a Lambda function. So it is easier to just create an EFS beforehand with all the required details.
To create an EFS you will have to go to the EFS page on AWS console. You would see a "Create file system" button on the EFS main page.

Once the file system is created, we need to create an access point. When you go into the newly created file system, there is an "Access Points" tab where you can click on the "Create an access point" button.


You can enter any number for user and group id values. Setting these values creates a default root directory for mounting.
I have set the permissions to 777, this would allow anyone write access. This is simply for convenience. You can create a separate, read only access point later on, when you no longer need to write to it. That would be more secure.
Create an EC2 Instance
Now that we have an EFS file system, we need an EC2 instance within the same VPC in order to mount and access the storage. While AWS can automatically create an EFS filesystem when creating an instance, it doesn’t create an access point, without which the Lambda function can not mount it.
Go to the EC2 dashboard on your console, and click on Launch instances button. You need to select a Linux or OSX instance, as you can not mount an EFS on a Windows device. I will proceed with the default selection, a free tier eligible Amazon Linux 2 AMI instance, t2.micro.

Make sure you select a Subnet (required to mount the EFS) and set "Auto-assign Public IP" to enabled.
Further below you will see the file system settings.

Clicking on the "Add file system" button will allow you to select the EFS we have created. Here you can set what folder the EFS will be mounted. We will use that later when installing packages.
We don’t have to create a new security group, so you can deselect it. But then make sure you go into the "Configure Security Group" step to select the security group.

You can now launch the instance. Make sure to download the key file, we will need that to ssh into the EC2 instance.
Installing Packages and Libraries
You can now SSH into the EC2 instance and install the libraries you need. For instance if you need to install Python packages simply use pip to install on the mounted EFS folder.
sudo pip3 install - target /mnt/efs/fs1 pandas
"/mnt/efs/fs1" is the folder we set for the file system mount when we created the EC2 instance.
You can install all the packages you want, and upload files of your own. You can also upload large model files. All files can be accessed directly from the Lambda function using a mount folder.
You use scp to upload files, but you can not upload directly to the mounted drive as it requires root privileges. You will first have to change permissions on the folder:
sudo chmod 777 /mnt/efs/fs1
The same concepts apply to other languages. If you are developing a node app you can simply install libraries using npm, into the mounted EFS. Later on we can access them from a Lambda function.
After you are done installing packages and uploading files, you can stop or terminate the EC2 instance. If you need to make additional changes you can restart the instance or even create a new one following the same steps.
Lambda Function
You can now go to the Lambda dashboard on your AWS console and create your Lambda function, name it and expandthe advanced settings section.

Here make sure you have selected the VPC, and the security group we created earlier. You can then create the Lambda function. This could take a few minutes.
Once the Lambda function is created and you saw the confirmation message, go into the Lambda function, Configuration tab, File Systems section.

Clicking on "Add file system" button will allow you to select the file system and access point we created earlier.

Note that the mount path is slightly different here. Just make a copy of this path, we will use it in the Environment variables section.

For Python we have the PYTHONPATH environment variable. Under the environment variables tab simply click on edit and add this variable, using the mount path we used in the previous step (i.e. /mnt/fs1). For node.js apps you would set the NODE_PATH variable and so on.
Now you are all set. The packages you installed can be loaded from your Lambda function directly, without having to do anything else. You can also access other files you added to the EFS from within the same Lambda function.