
LLM applications, when developed to use third-party hosted LLMs such as OpenAI, do not require MLOps overhead. Such containerized LLM-powered apps or microservices can be deployed with DevOps practices. In this article, let’s explore how to deploy our LLM app to a cloud provider such as AWS, fully automated with infrastructure and application pipelines. LlamaIndex has a readily made RAGs chatbot for the community. Let’s use RAGs as the sample app to deploy.
IaC Self-Service
IaC, short for Infrastructure as Code, automates infrastructure provisioning, ensuring that configurations are consistent and repeatable. There are many tools to accomplish IaC. We will focus on HashiCorp’s Terraform in this article, mainly because Terraform is cloud-agnostic.
The primary purpose of IaC self-service is to empower developers with more access, control, and ownership over their pipelines to boost productivity.
For those interested, I wrote a 5-part series on DevOps self-service model about a year ago to detail all aspects related to a DevOps self-service model.
High-Level Deployment Diagram
There are many different options for deploying a containerized application to AWS. ECS fargate stands out for a few good reasons:
- Serverless computing for containers, no server management required
- Increased Elasticity and Scalability
- Simplified deployments
We first flesh out the high-level deployment diagram for our RAGs app.

To deploy RAGs into AWS, we need pipelines.
Overview of Pipelines
Let’s first explore the self-service pipeline architecture based on a 3–2–1 rule (a term coined by me):
- 3 types of source code: terraform code, app source code, and GitHub Actions workflow code.
- 2 types of pipelines: infrastructure pipeline and application pipeline.
- 1 pipeline integration glue: GitHub secrets creation automation.

Let’s break it down.
Infrastructure Pipeline
We use Terraform core features such as terraform init
, terraform plan
, terraform apply
, etc., in our infrastructure pipeline; see the diagram below. Despite the license change for Terraform in August 2023, Terraform’s core features remain open source.
We add a few steps before terraform init
:
- Harden Runner for workflow security
- Infracost for cloud cost management
- TFLint for linting
- Checkov for static IaC code analysis.
For more details on these tools, check out my article on pipeline security and guardrails.

Do we need to write our IaC code in Terraform from scratch? No. We turn to the well-known open-source Terraform reusable modules, terraform-aws-modules
.
terraform-aws-modules
[terraform-aws-modules](https://github.com/terraform-aws-modules)
is a diverse collection of pre-built, reusable, and open-source Terraform modules designed explicitly for managing resources on AWS. Led by Anton Babenko, terraform-aws-modules
has 57 modules so far! These modules aim to simplify and automate infrastructure provisioning on AWS, standardize best practices, and allow you to focus on writing less infrastructure code and achieving faster deployments.
For GCP, there is terraform-google-modules, and for Azure, there is Azure-Verified-Modules.
You can write your own reusable module if you so choose, but these open-source reusable modules are community-supported and tested. They are great to jump-start your infrastructure pipeline development.
For our RAGs app, assuming we will deploy it in a new AWS account, we are going to choose the following modules from terraform-aws-modules
at the bare minimum. I say "bare minimum" because you can add many other resources to this stack depending on your project needs, such as for authentication/authorization, etc. But for this POC demo app, we will stick with the minimum requirements as the article aims to demonstrate the self-service model and showcase the open-source IaC reusable modules. Mastering both ingredients lets you pick and choose the reusable modules to provision additional resources based on your project requirements.

terraform-aws-vpc
: the networking module provisioning new VPC, public/private subnets, internet gateway, NAT gateway, route tables, etc.terraform-aws-s3-bucket
: S3 bucket for our ALB logs.terraform-aws-alb
: the application load balancer (ALB) for our ECS cluster.terraform-aws-ecs
: the elastic container service (ECS) fargate instance to which we will deploy RAGs.terraform-aws-ecr
: the elastic container registry (ECR) housing the docker image for our app.
Implementation Prerequisites
- Configure OpenID Connect (OIDC) in AWS: We will use GitHub Actions workflows to kick off Terraform modules for infrastructure provisioning. OIDC allows our GitHub Actions workflows to access AWS without storing the AWS credentials on the GitHub side. GitHub has detailed instructions on how to configure OIDC in AWS. Keep in mind this step only needs to be done once per AWS account.
- Terraform remote state management: The state of infrastructure is a crucial part of Terraform’s operations, as it maps real-world resources to our configuration, tracks metadata, and improves performance for large infrastructures. Terraform remote state allows users to store the state of their infrastructure in a remote data store for centralization, security, consistency, and other benefits. Again, this step only needs to be done once per AWS account. I have developed a Terraform reusable module to handle remote state management through S3 bucket and DynamoDB for state locking. The source code is located in my GitHub repo. To kick it off, you can use a GitHub Actions workflow like my sample workflow. For those unfamiliar with GitHub Actions, see more in the "Application Pipelines" section below.
Step 1: Create GitHub environments
GitHub environments play an essential role in our pipelines as they can store secrets/variables at three levels: environment, repository, and organization. These secrets/variables can be passed through the pipelines during infrastructure provisioning or application CI/CD to aid pipeline operations.
For our RAGs app, let’s create a GitHub environment named dev
and create two environment variables: ROLE_TO_ASSUME
for the application pipeline and TERRAFORM_ROLE_TO_ASSUME
for the infrastructure pipeline, with their values pointing to their respective IAM role’s ARN, assuming you already created the IAM roles by following instructions in the prerequisites section above. We use two different roles here so they can have different permissions assigned. Please note, you need to have admin rights to see the "Settings" tab in your repo.

Under the same "Settings" tab, we create a few secrets at the repository level, which means they can be applied to different environments for the same app.

NPM_TOKEN
: You need this token to call Terraform reusable module(s) as the application doesn’t pass such credentials when calling Terraform reusable module(s). A token withrepo
scope is required for the calling app to connect to the repo where Terraform reusable modules reside. This is especially important if your repo is private.PIPELINE_TOKEN
: You need this token for Terraform to call the GitHub provider to auto-create GitHub secrets/variables such asECS_CLUSTER
,ECS_SERVICE
, etc. based on the resources Terraform provisioned. Such automation of secrets/variables integrates the infrastructure pipeline with the application pipeline, making it a seamless transition between your infrastructure provisioning and your application’s CI/CD. This token needs to have therepo
andread:public_key
scopes.OPENAI_KEY
: This is where you store your OpenAI API key. Stored as a secret here, it doesn’t get leaked into your source code. We will explore further in the "Application Pipeline" section on how to retrieve this secret and pass in the CI pipeline.INFRACOST_API_KEY
: The API key for Infracost, an infrastructure cost management tool that can help us automate cloud cost management.
Step 2: Add infrastructure pipeline code
Finally, let’s add our infrastructure pipeline code to our repo. See the files/folder below related to the infrastructure pipeline. The sample code can be found in my repo. For a detailed dive into why and how our Terraform code is structured this way, refer to my article on Terraform project structure.

The main.tf
file is the main wrapper for the Terraform reusable modules. Depending on your stack, you could call one or multiple reusable modules in this file. For our RAGs app, we call five reusable modules to provision our infrastructure, as mentioned above in the terraform-aws-modules
section.
For each reusable module in terraform-aws-modules
, refer to that module’s example code for usage patterns. Depending on your use case, you can pick either the simple or complete example and use it as the base for that reusable module in your main.tf
.
You then parameterize the sample example code, externalizing certain variables to the terraform.tfvars
file under the .env
folder for the specific environment. For example, the values for CPU/memory for your prod environment most likely will be different from those used for your dev environment. So, CPU/memory are good candidates for parameterization.
Let’s look at a few key points in the sample Terraform code for ECS provisioning below.
- Line 174 is where we call the reusable module
terraform-aws-ecs
. - Line 177 onwards is where we pass variables such as
cluster_name
to the reusable module.

Depending on your use case, if you have many applications sharing similar AWS stacks, you could move most of the logic in main.tf
into a centralized reusable modules repo to have another abstraction layer on top of the original terraform-aws-modules
. This approach would allow further reusability of your IaC code, allowing the caller repos to have minimal IaC code for parameterization. My article on the Terraform project structure details the implementation of such a central repo holding reusable modules within an organization. Feel free to check it out.
Step 3: Add GitHub Actions workflow for infrastructure pipeline
I have created a reusable GitHub Actions workflow for Terraform provisioning, capturing steps such as workflow security, cloud cost management, IaC linting, scanning, and eventually, terraform init
, plan
, and apply
. It serves as a sample workflow, and you are welcome to revise it according to your needs.
From our RAGs repo, we add terraform-aws.yml
under .github/workflows
directory. The key logic in this workflow is highlighted in red below:

permissions
: it’s important to specifyid-token: write
, as this is needed for the GitHub Actions workflow to authenticate to AWS using OIDC.uses
: this line calls the reusable workflow, saving us from duplicating the same logic from one workflow to another and from one repo to another.secrets: inherit
: this line allows us to carry the secrets/variables configured in our repo at the environment/repository/organization level to the reusable workflows in a different repo.
Step 4: Kick off the infrastructure pipeline
Now, all the ducks are in a row. Let’s kick off our infrastructure pipeline by triggering the "Terraform AWS provisioning" workflow in the RAGs repo.

This workflow will provision the AWS resources in our dev environment. Once completed, pay attention to the output from the Terraform Apply
step; see the screenshot below. We will use the alb_dns
value to launch our RAGs app later. Note down this value.

Let’s log into AWS and peek at the VPC resource map; see the screenshot below. The networking (VPC, subnets, route tables, etc.) has been successfully provisioned. Also, verify that our new ECS cluster, ECR, ALB are ready.

Application Pipeline (CI/CD)
Now that our infrastructure in AWS is ready for our app. Let’s move on to building and deploying our app to the brand-new ECS cluster in AWS.
The diagram below lists the main steps in a CI (Continuous Integration) pipeline.

And our sample CD (Continuous Deployment) pipeline looks like this:

Step 1: Containerize the app if it’s not yet containerized
We first need to add a Dockerfile
to our RAGs repo to build the code into a Docker image and push it to the newly provisioned ECR in AWS. See the sample Dockerfile
snippet below for our RAGs app.
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt requirements.txt
COPY . .
RUN pip install -r requirements.txt
EXPOSE 8501
CMD ["streamlit", "run", "1_🏠 _Home.py"]
Step 2: Add GitHub Actions workflow for CI/CD
For development purposes, we can combine CI/CD into one single workflow. In reality, especially for higher environments, it’s recommended to separate CI/CD into two different workflows to ensure image immutability. More is discussed in my article on GitHub Actions workflow orchestration.
I have created two reusable GitHub Actions workflows to handle the CI and CD:
[python-build-image.yml](https://github.com/wenqiglantz/reusable-workflows-modules/blob/main/.github/workflows/python-build-image.yml)
: builds the docker image for any Python app.[deploy-to-ecs.yml](https://github.com/wenqiglantz/reusable-workflows-modules/blob/main/.github/workflows/deploy-to-ecs.yml)
: deploys the docker image to AWS ECS fargate.
API Key Management
Highlight below is the step in the image building workflow, where the workflow retrieves the OpenAI API key from the GitHub repository secret we defined in step 1 of the infrastructure pipeline above, and then writes the key to a secrets.toml
file under .streamlit
directory. This delegation of secrets handling to the CI pipeline allows us to be worry-free in managing secrets such as API keys. You should never have your API keys pushed in the source code of your repo.

Step 3: Kick off CI/CD pipeline
It’s recommended to have your CI/CD pipeline auto-triggered by code push or PR creation/merge, which is one of the main benefits of using GitHub Actions, as it’s seamlessly integrated with your GitHub repo. A sample trigger in your workflow can be defined like this:
on:
push:
branches:
- main
pull_request:
Now it’s time to kick off the application pipeline either by manual trigger or code push/PR to build our Docker image and get it deployed to ECS. See below a successful CI/CD workflow execution.

Step 4: Launch our RAGs app
It’s time to launch our RAGs app, which is now successfully deployed in AWS ECS fargate. Remember the infrastructure pipeline output from Terraform Apply
step above? Enter that URL to launch RAGs app. It works!

Destroy and Cleanup
To conclude, we now need to destroy our AWS resources and clean up. There is an alternate flow for terraform destroy
built in our terraform GitHub Actions workflow file. The destroy workflow can only be triggered by creating a destroy
branch of your app, and then triggering your terraform workflow by selecting the destroy
branch. The reason for creating a separate destroy
branch for such activity is to add an extra layer of protection so users don’t trigger the destroy
flow by accidentally fat-fingering the destroy
action in a dropdown (if we have it) in the manual trigger. See the screenshot below to destroy your AWS resources by triggering the Terraform AWS provisioning workflow.

Key End-to-end Implementation Points
Let’s capture some key end-to-end implementation points of our pipelines, from the GitHub repository, GitHub Actions workflows, and Terraform code. The diagram below highlights the end-to-end flow, from code, to configuration, to the integration points.

Summary
We explored using the DevOps self-service model for deploying Llamaindex‘s RAGs chatbot to AWS ECS fargate in this article. For LLM-powered apps that do not require MLOps, the DevOps self-service model works perfectly fine to deploy them to cloud providers like AWS.
With the pipeline framework offered by this self-service model, coupled with the large number of open-source IaC reusable modules such as terraform-aws-modules
, you can mix and match the reusable modules according to your project requirements, in turn gain more access, control, and ownership over your pipelines to boost productivity.
I hope you find this article helpful. The complete source code for this framework is located in my GitHub repos:
Happy coding!
References:
- terraform-aws-modules GitHub repo
- Configuring OpenID Connect in Amazon Web Services
- The Path to DevOps Self-Service: A Five-Part Series
- DevOps Self-Service Centric Pipeline Integration
- DevOps Self-Service Centric Pipeline Security and Guardrails
- DevOps Self-Service Centric GitHub Actions Workflow Orchestration
- DevOps Self-Service Centric Terraform Project Structure
- DevOps Self-Service Pipeline Architecture and Its 3–2–1 Rule
- Infracost + Terraform + GitHub Actions = Automate Cloud Cost Management