The world’s leading publication for data science, AI, and ML professionals.

Building a QA Research Chatbot with Amazon Bedrock and LangChain

Overview and Implementation with Python

Image by Chen from Pixabay
Image by Chen from Pixabay

Table of Contents

IntroductionObjectiveChatbot ArchitectureTech StackProcedureStep 1 – Load the PDF DocumentsStep 2—Build the Vector StoreStep 3- Loading the LLMStep 4- Creating the Retrieval ChainStep 5 – Build the User InterfaceStep 6 – Run the Chatbot ApplicationStep 7- Containerize the ApplicationFuture StepsConclusionReferences


Introduction

Not too long ago, I attempted to build a simple custom chatbot that would be run entirely on my CPU.

The results were appalling, with the application crashing frequently. That being said, this is not a shocking outcome. As it turns out, housing a 13B parameter model on a $600 computer is the programming equivalent to making a toddler trek a mountain.

This time, I made a more serious attempt towards building a research chatbot with an end-to-end project that uses AWS to house and provide access to the models needed to build the application.

The following article details my efforts in leveraging RAG to build a high-performant research chatbot that answers questions with information from research papers.


Objective

The aim of this project is to build a QA chatbot using the RAG framework. It will answer questions using the content in pdf documents available on the arXIV repository.

Before delving into the project, let’s consider the architecture, the tech stack, and the procedure for building the chatbot.


Chatbot Architecture

Chatbot Workflow (Created by Author)
Chatbot Workflow (Created by Author)

The diagram above illustrates the workflow for the LLM application.

When a user submits a query on a user interface, the query will get transformed using an embedding model. Then, the vector database will retrieve the most similar embeddings and send them along with the embedded query to the LLM. The LLM will use the provided context to generate an accurate response, which will be shown to the user on the user interface.


Tech Stack

Building the RAG application with the components shown in the architecture will require several tools. The noteworthy tools are the following:

  1. Amazon Bedrock

Amazon Bedrock is a serverless service that allows users access to models via API. As it uses a pay as you go system and charges by the number of tokens used, it is very convenient and cost-effective for developers.

Bedrock will be used to access both the embedding model and the LLM. In terms of configuration, using Bedrock will require creating an IAM user with access to the service. Furthermore, access to the models of interest must be granted in advance.

2. FAISS

FAISS is a popular library in the Data Science space and will be used to create the vector database for this project. It enables quick and efficient retrieval of relevant documents based on a similarity metric. It is free too, which always helps.

3. LangChain

The Langchain framework will facilitate the creation and usage of the RAG components (e.g., vector store, LLM).

4. Chainlit

The Chainlit library will be used to develop the user interface of the chatbot. It enables users to build an aesthetic front end with minimal code and offers features suited for chatbot applications.

Note: The technical portion of the article will include code snippets of Chainlit operations, but will not cover the syntax or functionality of these operations

5. Docker

For portability and ease of deployment, the application will be containerized using Docker.


Procedure

Developing the LLM application will require the following steps. Each step will be explored individually.

  1. Load the PDF Documents
  2. Build the Vector Store
  3. Create the Retrieval Chain
  4. Design the User Interface
  5. Run the Chatbot Application
  6. Run the Application in a Docker Container

Step 1 – Load the PDF Documents

Load and Process PDF Documents (Created by Author)
Load and Process PDF Documents (Created by Author)

ArXIV is a repository containing a plethora of free, open-source articles and papers on topics ranging from economics to engineering. The backend data of the application will comprise a few documents on LLMs from the repository.

Once the selected documents are stored in a directory, they will be loaded and transformed into text chunks using LangChain’s PyPDFDirectoryLoader and RecursiveCharacterTextSplitter, respectively.


Step 2—Build the Vector Store

Build the Vector Store (Created by Author)
Build the Vector Store (Created by Author)

The text chunks created in step 1 are embedded using Amazon’s Titan Text Embeddings model. They can be accessed by code with boto3, the Amazon SDK for Python. The titan model can be identified by the model_id provided in the Bedrock documentation.

The embedded chunks are stored in a FAISS vector store, which is saved locally as "faiss_index".


Step 3- Loading the LLM

LLM (Created by Author)
LLM (Created by Author)

The LLM for the application will be Meta’s 13B Llama 2 model. Much like the embedding model, the LLM is accessed with Amazon Bedrock.

One noteworthy parameter is temperature, which affects the randomness of the model’s output. Since the application is designed to be used for research, randomness will be minimized by setting temperature to 0.


Step 4- Creating the Retrieval Chain

Retrieval Process (Created by Author)
Retrieval Process (Created by Author)

In LangChain, a "chain" is a wrapper that facilitates a series of events in a specific order. In this RAG application, the chain will receive the user query and retrieve the most similar chunks from the vector store. The chain will then send the embedded query and the retrieved chunks to the loaded LLM, which will generate a response using the provided context.

The chain also incorporates the ConversationBufferMemory, which allows the chatbot to retain memory of previous queries. This enables the user to ask follow up questions.

Another noteworthy mention is the hyperparameter k for the retriever, which specifies the number of embeddings that should be taken from the vector store. For this use case, we set k to 3, meaning that the LLM application will use 3 embeddings for context to answer each query.


Step 5 – Build the User Interface

So far, the backend components of the application have been developed, so it is time to work on the front end. Chainlit makes it easy to build user interfaces for LangChain applications, as existing code only needs to be modified with additional chainlit commands.

Chainlit is used for creating the function that establishes the chain.

It is also used for creating the function that uses the chain to generate responses and send them to the user.

The Chainlit decorators are a necessary inclusion. The on_chat_start decorator defines the operations that should be run when the chat session is started (i.e., setting up the chain), while the on_message decorator defines the operations that should be run when the user submits a query (i.e., send the response).

In addition, the code incorporates the use of async and await commands so that the tasks are handled asynchronously.

Finally, since the LLM application is designed for research, the generated response will include the sources of the embeddings retrieved from the vector store after the similarity search. This makes the generated responses citable, and as a result, more credible in the eyes of the user.


Step 6 – Run the Chatbot Application

With all components in the chatbot workflow created, the application can be run and tested. With Chainlit, a session can be started with a simple one-liner:

chainlit run <app.py>
Chatbot UI (Created by Author)
Chatbot UI (Created by Author)

The chatbot is now up and running! It shows the message provided in the code upon the start of the session.

Let’s test it with a simple query:

Query 1 (Created by Author)
Query 1 (Created by Author)

When a query is submitted, the response is both concise and comprehendible. Furthermore, it includes the sources of the 3 vector embeddings that were used to generate the response, including the name of the document and the page number.

To ensure that the chatbot is retaining memory of previous queries, we can submit a follow up query.

Query 2 (Created by Author)
Query 2 (Created by Author)

Here, we ask for "another example" without providing additional information on the example that is needed. Since the bot is retaining memory, it knows that the query is referring to pretrained LLMs.

Overall, the application performs at a satisfactory level. One aspect that can’t be demonstrated in an article is the significantly lower computation demand needed to run the chatbot. Since AWS houses the embedding model and the LLM, there is no risk of any crashes from excessive CPU utilization.


Step 7- Containerize the Application

Although the chatbot is up and running, there is still one step remaining. The LLM application still needs to be containerized with Docker for easier portability and version control.

The first step for containerization is to develop the DockerFile.

In this Dockerfile, we create a Python image as a base, define arguments for the access key ID and secret access key from AWS, install the requirements.txt file in the container, copy the current directory in the container, and run the chainlit application.

It is pretty easy from here on out. Building the docker image takes a one-liner:

docker build --build-arg AWS_ACCESS_KEY_ID=<your_access_key_id> --build-arg AWS_SECRET_ACCESS_KEY=<your_secret_access_key> -t chainlit_app .

The command above buildss an image named chainlit_app. It includes the AWS access key id and the AWS secret key as arguments since they needed to access the models in Amazon Bedrock via API.

Finally, the application can be run in a Docker container:

docker run -d --name chainlit_app -p 8000:8000 chainlit_app 
Chabot in Dockerized Application (Created by Author)
Chabot in Dockerized Application (Created by Author)

The application is now running in port 8000! Since the application is being run locally, the chatbot will be hosted on http://localhost:8000.

Let’s see if it is the RAG components (including the AWS Bedrock models) are still operational by submitting a query.

Query 1 in the Dockerized Application (Created by Author)
Query 1 in the Dockerized Application (Created by Author)

It works just as expected!


Future Steps

The current chatbot is able to respond to queries with decent performance and at a low cost. However, the application is still run locally and uses default parameters. Thus, there are still measures that can be taken to further enhance the performance and usability of the chatbot.

  1. Perform Rigorous Testing

The LLM application appears to perform effectively, with the responses being concise and accurate. However, the tool still needs to undergo rigorous testing before it can be deemed usable.

The testing would primarily serve to ensure that response accuracy is maximized while hallucinations are minimized.

2. Implement Advanced RAG Techniques

If the chatbot is unable to answer specific types of questions or just consistently performs poorly, it would be worth considering the use of advanced RAG techniques to improve certain aspects of the workflow, such as the retrieval of content from the vector database.

3. Polish the Front End

Currently, the tool uses the default front end provided by Chainlit. To make the tool more aesthetic and intuitive, the UI design can be further customized.

In addition, the citation feature of the chatbot (i.e., identifying the source of the response) can be improved by providing a hyperlink so that the user can immediately go to the page that contains the information they need.

4. Deploy to the Cloud

If there is a need to offer this application to a larger user base, the next step would be to deploy it in a remote server using cloud platforms with services like Amazon EC2 and Amazon ECS. High scalability, availability, and performance are attainable with many cloud platforms, but since the tool leverages AWS Bedrock, the natural progression would be to harness other resources under the AWS umbrella.


Conclusion

Image by Gerd Altmann from Pixabay
Image by Gerd Altmann from Pixabay

Working on this project, I felt blown away by how far the data science space has advanced. NLP applications that harness generative AI would have been difficult to build just 5 years ago as it would require considerable time, money, and manpower.

In 2024, such tools can be built with just takes a man, a little code, and minimal expense (the whole project has cost under $1 so far). It makes you wonder just what will be possible in the upcoming years.

For those more interested in the codebase for the project, please visit the GitHub repository:

anair123/Building-a-Research-Chatbot-with-AWS-and-Llama-2 (github.com)

Thank you so much for reading!

References

  1. Stehle, J., Eusebius, N., Khanuja, M., Roy, M., & Pathak, R. (n.d.). Getting started with Amazon Titan text embeddings in Amazon bedrock … https://aws.amazon.com/blogs/machine-learning/getting-started-with-amazon-titan-text-embeddings/

Related Articles