The world’s leading publication for data science, AI, and ML professionals.

Structuring Your Cloud Instances’ Startup Scripts

Separating between first launch vs reboot

Most of your machine-learning tasks will typically be, after the initial exploration phase, packaged into images and deployed to on-premise or cloud servers. This will facilitate the rapid iteration to build the infrastructure supporting the operationalisation of the MLOps pipeline with the involvement of the entire development team comprising data scientists, together with data, software, cloud engineers, etc.

Startup scripts are used to execute automated configuration or other tasks upon the start of a cloud server instance. This is known as user data in AWS EC2, startup scripts in Google Cloud Engine, and custom script extensions in Azure Virtual Machine. The contents within the startup scripts could come in the form of installations, metadata settings, environment variables, etc. The main purpose is that each instance is always configured to be ready to serve the applications within or adjacent services whenever it is started.

As with all scripts we write, we should always target them to be neat, structured and centralised so that they can be reused as templates. This will make your life easier to manage multiple applications in different instances in your project. In the following sections, I will be showing how you can do that.

While the latter sections are specific to AWS EC2’s user data, they can easily be adapted to other providers using the same concept.

1) First Launch vs Reboot Startup Scripts

It is quite intuitive to use startup scripts on an instance’s first launch, but rebooting? If we are using on-demand instances, and they are not meant for production environments (e.g., dev, staging, SIT, UAT), it makes little financial sense to have them running on weekends or after office hours when developers are not at work. Hence, they are scheduled to be turned off and restarted when they are required. There are also occasions when a reboot is required on patching.

During those periods of shutdown, there could be updates to the metadata that the application needs. Hence, after a reboot, these should be refreshed to reflect the latest information.

Henceforth, user data can be served to configure instances when they are first launched, and also on reboot. More often than not, both types of start-ups do not require the same configuration, but the dilemma is that we can only attach one user data file to each instance. So how can we differentiate them within the same user data file?

Multi-part format

If we only require our user data to be executed on the instance’s first launch, the script can just contain the shell commands. However, to enable it to be also executed in every instance reboot, a cloud-config command is required. This is in a separate format, hence AWS uses the MIME (Multipurpose Internet Mail Extensions) multi-part format to contain both information.

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

# your script here

--//--

You can see from the above the definition of MIME, followed by the cloud configuration, with [scripts-user, always] indicating the user data to be executed on both instance’s first launch and subsequent reboot. The next format is catered for the shell commands.

Differentiate between first launch & reboot

Technically, AWS does not have a user data configuration to separate your scripts based on the first launch and reboot. Luckily we can use some simple scripting to do that elegantly, as you can see from the pseudo code below.

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/Bash

# --------------- define functions --------------- #

function install_docker() {
  # some installations
}

function create_dotenv() {
  # create .env file
}

function setup_docker_compose() {
  # setup docker-compose.yml
}

function launch_docker_compose() {
  # launch your container
}

# --------------- execute script --------------- #

if [ ! -e "STARTED" ]; then
  # on first launch
  install_docker
  create_dotenv
  setup_docker_compose
  launch_docker_compose
  touch "STARTED";
else
  # on restart
  create_dotenv
  setup_docker_compose
fi

--//--

First, we need to structure our script into functions so that they can be called later either during the first launch or reboot. You can see that I have defined four functions for install_docker, create_dotenv, setup_docker_compose and launch_docker_compose. Appropriate arguments should be set to make it as reusable as possible.

Second, we have a simple if-else statement such that when the file STARTED is not present, it will execute all four functions, and at the end, create the STARTED file within. In the reboot of that instance, since that STARTED file is present, it will only run two of the configuration functions, not the others.

That’s pretty straightforward, right? Below is a working example using a Ubuntu virtual machine to illustrate this further.

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash

# --------------- define functions --------------- #

function install_docker() {
  # https://docs.docker.com/engine/install/ubuntu/
  sudo apt-get update;
  sudo apt-get install -y ca-certificates gnupg lsb-release;

  sudo mkdir -p /etc/apt/keyrings
  curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --yes --dearmor -o /etc/apt/keyrings/docker.gpg
  echo 
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu 
    $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  sudo apt update
  sudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-compose-plugin
}

function create_dotenv() {
  ENV=$(curl -s http://169.254.169.254/latest/meta-data/tags/instance/Env)
  cd $1
  rm -f .env

  # Calculate memory limit & reservation for docker container
  # 90% limit, 70% reserved
  total_memory=$(free -m | awk '/^Mem:/{print $2}')
  MEM_LIMIT=$(echo "$total_memory * 0.9" | bc)
  MEM_LIMIT=$(printf "%.0f" "$MEM_LIMIT")
  MEM_RES=$(echo "$total_memory * 0.7" | bc)
  MEM_RES=$(printf "%.0f" "$MEM_RES")
  echo "Memory limit: $MEM_LIMIT $MEM_RES MB"

  echo MEM_LIMIT=${MEM_LIMIT}M >> .env
  echo MEM_RES=${MEM_RES}M >> .env  
  echo ENV=$ENV >> .env
  echo -e "[INFO] dotenv created ==========n"
}

function setup_docker_compose() {
  # pull docker-compose file
  CI_TOKEN="get from secrets-manager"
  curl --header "PRIVATE-TOKEN: $CI_TOKEN" "https://gitlab.com/api/v4/projects/${1}/repository/files/docker-compose.yml/raw?ref=main" -o ${2}docker-compose.yml

  # pull image
  AWS_ACCOUNT=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .accountId)
  AWS_REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
  aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin
  echo -e "[INFO] docker-compose downloaded & docker logged in ==========n"
}

function launch_docker_compose() {
  docker compose pull
  docker compose up -d
  echo -e "[INFO] docker image pulled and up ==========n"
}

# --------------- execute script --------------- #

PROJECTID=12345678
HOMEDIR=/home/ubuntu/

if [ ! -e "STARTED" ]; then
  # on first launch
  install_docker
  create_dotenv $HOMEDIR
  setup_docker_compose $PROJECTID $HOMEDIR
  launch_docker_compose
  touch "STARTED";
else
  # on restart
  create_dotenv $HOMEDIR
  setup_docker_compose $PROJECTID $HOMEDIR
fi

--//--

A short description of each function is provided. Note the use of arguments to make each function reusable.

  • install_docker(): update package manager, and install base libraries as well as docker and docker-compose.
  • create_dotenv(): grab metadata of the environment, e.g., dev, staging, prod from the instance metadata tag, and put it in a .env file.
  • set_docker_compose(): get the latest docker-compose.yml file from the source code repository, set the image tag using the environment within the file, and then log in to the container registry.
  • launch_docker_compose(): deploy the image as a container

2) Centralise Startup Scripts & Overcome Character Limits

User Data has a character or size limit of 16K and 16KB respectively. That is a healthy length to play with in most use cases. However, should you exceed this amount, you can easily store the scripts in your blob storage like S3 bucket, and within the user data, pull the scripts in and execute them. This is also the preferable way to update all user data scripts via a central store.

Content-Type: multipart/mixed; boundary="//"
MIME-Version: 1.0

--//
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"

#cloud-config
cloud_final_modules:
- [scripts-user, always]

--//
Content-Type: text/x-shellscript; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="userdata.txt"

#!/bin/bash

# --------------- define functions --------------- #

function install_aws_cli() {
  sudo apt-get update;
  sudo apt-get install -y curl unzip;

  sudo curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip";
  sudo unzip awscliv2.zip;
  sudo ./aws/install;
  rm -f awscliv2.zip; rm -rf aws;
}

function download_scripts() {
  # download template functions from S3
  aws s3 cp s3://<s3.bucket.name>/userdata_template.sh userdata_template.sh
  source userdata_template.sh
}

# --------------- execute script --------------- #

PROJECTID=12345678
HOMEDIR=/home/ubuntu/

cd $HOMEDIR
if [ ! -e "STARTED" ]; then
  # on first launch
  install_aws_cli
  download_scripts
  install_docker
  create_dotenv $HOMEDIR
  setup_docker_compose $PROJECTID $HOMEDIR
  launch_docker_compose
  touch "STARTED";
else
  # on restart
  download_scripts
  create_dotenv $HOMEDIR
  setup_docker_compose $PROJECTID $HOMEDIR
fi

We can store all four functions above in a file called userdata_template.sh and place it in an S3 bucket of your choice.

To access the S3 bucket, we need to ensure that 1) the instance has the relevant permissions to read from this bucket within the instance profile, and 2) that the instance has aws-cli installed so that the appropriate commands can be used to pull the startup script from S3.

With that, we can easily download the script, source it so that you can access the earlier functions, and execute them accordingly.

3) Debugging

If your user data is not performing the task as expected, you can view the log file within the instance to see if any error messages are captured. This can be found at /var/log/cloud-init-output.log.

# print the last 100 lines of log file
tail -n 100 /var/log/cloud-init-output.log

If you need to check on the user data script itself, you may do that with the following two methods.

# print the user data script
curl -s http://169.254.169.254/latest/user-data

# the script itself is stored in this directory
cd /var/lib/cloud/instance/scripts

Summary

So there you have it! Hope you have learnt some elegant tips to structure your user data from your virtual machines’ first launch and reboot. I hope you find it useful and intuitive.

Reference


Related Articles