How to Automate Your Mongo Database Backups on Kubernetes and S3

Schedule and automate database backups from your Kubernetes cluster to AWS S3

Published in

Towards Data Science

8 min readNov 4, 2022

Photo by Art Wall - Kittenprint on Unsplash

If you are running a self-hosted database such as MongoDB, chances are you don’t have the benefit of automated backups offered by managed services.

This article is about how to use Kubernetes CronJobs to schedule automated backup jobs for MongoDB on a Kubernetes cluster, and storing these backups in an AWS S3 bucket.

Backup jobs are scheduled periodically to fetch data from the DB, assume an IAM role to get credentials, then upload to S3 – image by author

We’ll focus on a standalone MongoDB setup, and the same principles apply to replica sets, and to other databases.

Since many organisations are using Terraform to manage infrastructure as code, we’ll write our CronJob in Terraform format directly.

Technology Stack

MongoDB
Kubernetes
AWS S3
Terraform
Docker

Content

Create a Terraform module (optional)
Create a Docker image with the required tooling
Define variables and data
Create an S3 bucket for storing the backups
Create an IAM role and a Kubernetes Service Account
Store MongoDB’s password as a Kubernetes Secret
Create the Kubernetes CronJob
Deploying the infrastructure
Explanations

Let’s dive in!

Requirements:
This project uses Terraform with the AWS provider and the Kubernetes provider.
Feel free to checkout the official tutorial Provision an EKS Cluster to get a Kubernetes cluster up and running.

Create a Terraform module (optional)

To keep things structured and clean within an infrastructure-as-code repository, I like to partition logical tasks into sub-modules.

Note that this is optional, and you can include the templates created below anywhere in your infrastructure source code.

Let’s create a directory database-backup within a modules directory:

Create a Docker image with the required tooling

To perform the backup, we need to dump data from the database and upload it to S3.

We’ll use the convenient mongodump to dump the data, and the AWS CLI to upload the data dumps to S3.

In the Terraform module created above, add a directory mongodb, and within this new directory create a Dockerfile with the following content:

# Dockerfile# Base on Amazon Linux 2 (will be running on AWS EKS)
FROM amazonlinux:2RUN yum install -y unzip# Install AWS CLI
RUN curl -sOL https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip \
 && unzip awscli-exe-linux-x86_64.zip \
 && ./aws/install# Install MongoDB CLI tools
RUN curl -sOL https://fastdl.mongodb.org/tools/db/mongodb-database-tools-amazon2-x86_64-100.6.0.rpm \
 && yum install -y mongodb-database-tools-amazon2-x86_64-100.6.0.rpm

Build and push this image to your image repository

Let’s build this image and make sure to target the right platform — this is is especially important if you are developing on a different machine (say Macbook with M1 chip) than where your cluster is running.

export REPOSITORY_URL=<your repository URL, e.g. on AWS ECR>
export TAG=<COMMIT_HASH> # Should be commit hash, but can be an arbitrary stringdocker build --tag="$REPOSITORY_URL:$TAG" --platform=linux/amd64 .

To push the image, make sure you are logged into your repository (run docker login), then run:

docker push "$REPOSITORY_URL:$TAG"

Define variables and fetch data

We need to define some Terraform configuration variables and fetch data that will be used in the rest of this project. Feel free to adjust these to your current setup.

In a file names variables.tf, add the following:

# variables.tfvariable "kubernetes_namespace" {
  type = string
}variable "kubernetes_cluster_name" {
  description = "Kubernetes cluster where the backup job and permissions service account should be deployed"
  type        = string
}variable "container_image_repository" {
  description = "URL of the Docker image used in the CronJob container"
  type        = string
}variable "container_image_tag" {
  description = "Tag of the Docker image used in the CronJob container"
  type        = string
}variable "database_host" {
  description = "MongoDB host URL"
  type        = string
}variable "database_user" {
  description = "MongoDB user"
  type        = string
}variable "database_password" {
  description = "MongoDB password"
  type        = string
  sensitive   = true
}

In a file named data.tf, add the following:

# data.tfdata "aws_caller_identity" "current" {}data "aws_eks_cluster" "kubernetes_cluster" {
  name = var.kubernetes_cluster_name
}

Create an S3 bucket for storing the backups

We need a reliable location to store our backups, and AWS S3 offers good guarantees, in addition to being affordable and convenient to use.

Create a Terraform file s3-bucket.tf with the following content:

# s3-bucket.tfresource "aws_s3_bucket" "database_backup_storage" {
  lifecycle {
    # Prevent destroying the backups storage in case of accidental tear down
    prevent_destroy = true
  }  bucket = "database-backup-storage"
}

Optionally, we can add a lifecycle policy to automatically remove backups older than say 7 days. In the same file, add the following:

# s3-bucket.tf...resource "aws_s3_bucket_lifecycle_configuration" "database_backup_storage_lifecycle" {
  bucket = aws_s3_bucket.database_backup_storage.bucket
  rule {
    id     = "delete-old-backups-7d"
    status = "Enabled"
    
    filter {}    expiration {
      days = 7
    }
  }
}

Create an IAM role and a Kubernetes Service Account

Our backup jobs need permission to upload backups to S3. More specifically, we need to create:

an IAM role that has a policy allowing S3:PutObject operations in the backups S3 bucket
a Kubernetes Service Account to provide a web identity token that allows backup jobs to assume the IAM role to upload to S3

To learn more, here is the documentation on IAM roles for Service Accounts on AWS EKS.

In a file named access-control.tf, add the following:

# access-control.tflocals {
  service_account_name = "database-backup"  oidc_provider = replace(
    data.aws_eks_cluster.kubernetes_cluster.identity[0].oidc[0].issuer,
    "/^https:///",
    ""
  )
}resource "aws_iam_role" "role" {
  name = "database-backup-role"  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Federated = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:oidc-provider/${local.oidc_provider}"
        },
        Action = "sts:AssumeRoleWithWebIdentity",
        Condition = {
          StringEquals = {
            "${local.oidc_provider}:aud" = "sts.amazonaws.com",
            "${local.oidc_provider}:sub" = "system:serviceaccount:${var.kubernetes_namespace}:${local.service_account_name}"
          }
        }
      }
    ]
  })  inline_policy {
    name = "AllowS3PutObject"
    policy = jsonencode({
      Version = "2012-10-17"
      Statement = [
        {
          Action = [
            "S3:PutObject",
          ]
          Effect   = "Allow"
          Resource = "${aws_s3_bucket.database_backup_storage.arn}/*"
        }
      ]
    })
  }
}resource "kubernetes_service_account" "iam" {
  metadata {
    name      = local.service_account_name
    namespace = var.kubernetes_namespace  annotations = {
      "eks.amazonaws.com/role-arn" = aws_iam_role.role.arn
      "eks.amazonaws.com/sts-regional-endpoints" = true
    }
  }
}

Create the Kubernetes CronJob

We can use Terraform to define our CronJob in HCL format (see kubernetes_cron_job). Note that we can apply the same config in a Kubernetes manifest format (yaml/json).

In a file named backup-cronjob.tf, add the following:

# backup-cronjob.tfresource "kubernetes_cron_job" "database_backup_cronjob" {
  metadata {
    name      = "database-backup-mongodb-daily"
    namespace = var.kubernetes_namespace
  }  spec {
    schedule                      = "0 5 * * *" // At 05:00
    concurrency_policy            = "Replace"
    suspend                       = false
    successful_jobs_history_limit = 3
    failed_jobs_history_limit     = 3    job_template {
      metadata {}
      spec {
        template {
          metadata {}
          spec {
            restart_policy = "Never"            service_account_name = kubernetes_service_account.iam.metadata[0].name            container {
              name    = "database-backup"
              image   = "${var.container_image_repository}:${var.container_image_tag}"
              command = ["/bin/sh", "-c"]
              args = [
                "mongodump --host=\"$MONGODB_HOST\" --username=\"$MONGODB_USER\" --password=\"$MONGODB_PASSWORD\" --gzip --archive | aws s3 cp - s3://$S3_BUCKET/$S3_BUCKET_PREFIX/$(date +\"%Y%m%d-%H%M%S-%Z\").archive.gz"
              ]              env {
                name  = "MONGODB_HOST"
                value = var.database_host
              }
              env {
                name  = "MONGODB_USER"
                value = var.database_user
              }
              env {
                name  = "MONGODB_PASSWORD"
                value = var.database_password
              }              # Note that you can also set the DB password as a Kubernetes Secret then get it as
              # env {
              #   name = "MONGODB_PASSWORD"
              #   value_from {
              #     secret_key_ref {
              #       name = "mongodb"
              #       key  = "mongodb-password"
              #     }
              #   }
              # }              env {
                name  = "S3_BUCKET"
                value = aws_s3_bucket.database_backup_storage.bucket
              }
              env {
                name  = "S3_BUCKET_PREFIX"
                value = "mongodb"
              }resources {
                limits = {
                  cpu    = "1000m"
                  memory = "1000Mi"
                }
                requests = {
                  cpu    = "100m"
                  memory = "256Mi"
                }
              }
            }
          }
        }
      }
    }
  }
}

Deploying the infrastructure

Now that our backup module is ready, we can deploy it alongside the rest of our infrastructure.

You can place the following snippet in your Terraform code, e.g. at main.tf:

module "mongodb_backup" {
  source = "${path.root}/modules/database-backup"  kubernetes_namespace       = "<your namespace>"
  kubernetes_cluster_name    = "<your cluster name>"
  container_image_repository = "<Value of REPOSITORY_URL>"
  container_image_tag        = "<Value of TAG>"
  database_host              = <MongoDB host>
  database_user              = <MongoDB user>
  database_password          = <MongoDB password>  tags = {
    Name = "database-backup-mongodb"
  }
}

Once your terraform infrastructure applied (terraform apply), you should see that you have a CronJob up and running:

Explanations

How it works

A Kubernetes CronJob schedules jobs to run as pods capable of performing a given task. In the case of this database backup implementation, each scheduled job will perform the following steps:

Connect to standalone host at $MONGODB_HOST
Use mongodump to dump the data as compressed (gzip) archive and print result to standard output. Three things here: (1) we compress to reduce the size of the upload payload, (2) we archive to have the data dump in a single file (it’s optional, but can prevent issues when dumping to case-insensitive filesystems — see warning here), (3) and we print the dump to the standard output to be able to pipe it to the AWS CLI.
Pipe compressed archive to AWS CLI
Copy from standard input (pipe) to S3 bucket via:

aws s3 cp - s3://$S3_BUCKET

How to restore the data

Download the latest data dump from S3 to your local or to the target Mongo host (i.e. where you want to recover the data):

2. Use the mongorestore utility:

mongorestore --gzip --archive=/path/to/backup [--host $HOST --username $USERNAME --password $PASSWORD]

Important

The schedule should be changed according to your needs — the setup above is at 5:00 relative to your Kubernetes cluster’s timezone.
The MongoDB backup is configured to work with a single instance. Change config accordingly for a Replica Set (see usage of oplog).
The MongoDB server’s version must match the version of the DB the backup dumps originate from.

Final notes

When running a self-hosted database, it takes additional steps to set measures against data loss — in case of disaster, accidental deletes, hardware failures, …etc.

Thankfully the most common databases provide tooling to generate and restore backups, and cloud storage solutions like AWS S3 are reliable, affordable, and convenient to use.

Leveraging a job scheduler like Kubernetes CronJob, we can create an automatic solution that will take care of backing up data while we focus on building amazing applications 🎉!

If you find this tutorial useful and you’d like to support the making of quality articles, consider buying me a coffee!

You can click on the “Follow” button to get my latest articles and posts!