Serverless: Tweaking the Lambdas

Tweaking lambdas to ensure high scalability with non-scalable resources

Published in

Towards Data Science

6 min readMar 29, 2020

Serverless programming is quite easy for deployment. The scalability of serverless is only limited by the number you specify as the upper bound (at least that's what the docs say). However, unless you do your design right, the entire system can fail at once. This is because most of the external resources we plug into our AWS deployment aren’t as scalable as lambda functions. Furthermore, your design should be driven based on the idea that you are writing programs for the serverless environment.

In this article, I will share my experience based on the difficulties faced while working in a medium to large scale project with great concurrency. I will be using Node.JS as the programming environment.

The Basics

There are several facts that you should keep in mind.

You should never assume that the same instance will be used to handle requests, though that is a possibility which we might exploit to gain performance.
There is no permanent storage for data serialization. However, you have access to a small /tmp path to do serializations within a single invocation. Thus make sure you use unique file names to avoid clashes and get done within the same invocation.
No assumptions should be made about the order of requests served.
Usage is charged based on CPU hours (and RAM). So make sure your code can run fast without much thrashing.

Handling Bottleneck Resources

In almost all scenarios, web applications would like to connect to a database and fetch/write data. However, databases have a limited number of concurrent connections which can be easily passed by the level of concurrency a lambda function would have. Moreover, you could have multiple independent functions that would have database connections. For example, having 10 functions would need a minimum of 10 connections if they get invoked nearly at the same time. However, in a peak scenario, this can easily get multiplied by a factor large enough for the database to hit the limit.

Lowering the Scalability

Mostly, you would not reach a point where you want to have 1000 concurrent invocations. You can change this setting either from the AWS console or in the serverless.yml. (Have a look at my serverless boilerplate article to know more about organizing your serverless project).

functions:
  hello:
    handler: handler.hello
      events:
        - http:
            path: /hello
          method: get
    provisionedConcurrency: 5
    reservedConcurrency: 5

The reserved concurrency will make sure the function can always scale that many times and the provisioned concurrency determines how many instances will be live initially. This ensures the scaling times are not added to the service latency. In an example scenario, one might want to set up a little more concurrency for a home page while having very little concurrency on user management services as they aren’t coping high demands.

Caching Data and Avoid DB Connectivity

In certain scenarios, data are not required to be consistent. For example, a blog website might not want to fetch data always from the database. The users might be happy to read a little stale data, which they will never notice. Have a look at the following function.

exports.main = (event, context, callback) => {
    // connect to database, send data in callback(null, data);
}

This will always communicate with the database. However, one might do this as follows.

// cache data in a variable and close DB
exports.main = (event, context, callback) => {
    // send data if event.queryParameters are same
    // else: fetch data and send over
}

With this modification, very likely redundant data will not be sought from the database again and again. One might even use a timestamp to check if the data is too stale. There could be further tricks to keep functions warm by calling them using a chron job. This should not be done if the load distribution is non-uniform over time.

Keeping the Functions Warm

This is one famous suggestion from AWS itself to keep functions warm. We can simply do this by sending data over callback without having to finish the event loop.

exports.main = (event, context, callback) => {
    context.callbackWaitsForEmptyEventLoop = false;
}

One could set the callbackWaitsForEmptyEventLoop to false in the context object. This helps the database connections to be maintained while having the requests served before closing the database connection. This is similar to caching data in a global variable, but here instead, we cache the entire connection. Make sure to check if the connection is active since it can be broken by the database service itself.

In this method, there is a chance to overrun the database connection limit. But could be a handy trick since establishing the database connection itself could add some good latency.

Handling CPU Intensive Tasks

CPU intensive tasks are mostly matrix computations. However, we can think of watermarking images as one such task. Let’s have a look at the below example.

const Jimp = require('jimp');create_watermarks = async (key) => {
  // get the images from s3
  const s3Obj = await s3.getObject(params).promise();
  // read watermark image
  // for each image from s3
      watermarkImage = await Jimp.read(__dirname + '/wm.png');
      const img = await Jimp.read(s3Obj.Body);
      // add watermark
}

In this function, we are reading the image many times. However, adding the following can save a lot of time and speed-up everything.

const img = await Jimp.read(watermark);
watermarkImage = await Jimp.read(__dirname + '/wm.png');
// for each image from s3
    const watermarkCopy = watermarkImage.clone();
    // add watermakr

This can be further optimized by caching the already read watermark image in a global variable.

const Jimp = require('jimp');
watermarkCache = null;create_watermarks = async (key) => {
    if watermarkCache==null <- read and set variable
    // copy watermark and add to each image
}

Managing Concurrency inside a Function

Node.JS has the event loop architecture. In this, all the items are scheduled and run within a single thread. This is why we do not observe any race condition whatsoever in Node.JS concurrency. The event loop keeps polling for asynchronous tasks for results. So having too many concurrent tasks can make the event loop longer. Moreover, if the concurrent tasks use spawned processes, the system might get overwhelmed with an unwanted thrashing.

add_watermark (img, wm) => {add watermark, return new image}

Now, if you use the following function for 100 potential image keys to load from S3 and watermark, what would happen?

s3keys_list = [...100 keys...]
await Promise.all(map(add_watermark, s3keys_list))

The event loop will make 100 concurrent S3 object requests. This could make you pass the RAM limit and might even cost concurrent charges at S3 or at the other source. How do we fix this? A naive solution would be as follows.

s3keys_list = [...100 keys...]
tmp_list = []
for(x = 0; x < s3keys_list.length; x++)
{
   // s3keys_list[x] to tmp_list
   // if tmp_list.length == 5
   // await Promise.all(map(add_watermark, tmp_list))
   tmp_list = []
{
// process any remaining items in tmp_list the same way

With this method, you are guaranteed to have a specific concurrency. You will not have a large battalion of cloned watermarks exceeding RAM.

On top of the above tricks I discussed, we can use many other optimizations based on the knowledge of the domain and application. Furthermore, we can simply watch for the logs and use them to further tune the concurrency requirements. Sometimes, it might even be worth increasing RAM a bit at the cost of concurrency. This is because site maintenance endpoints are hardly used by many people.

I hope you enjoyed reading this article. These were a few tricks we used in developing a platform with many users. This helped us to cope with a one-time high demand during a broad promotion campaign where a lot of users got registered.