‘Talk’ to Your SQL Database Using LangChain and Azure OpenAI

Explore the power of natural language processing using LLMs for your database queries

Published in

Towards Data Science

10 min readSep 28, 2023

Langchain is an open source framework for developing applications which can process natural language using LLMs (Large Language Models).

The Agent component of LangChain is a wrapper around LLM, which decides the best steps or actions to take to solve a problem. The Agent typically has access to a set of functions called Tools (or Toolkit) and it can decide which Tool to use based on the user input. Each agent can perform various NLP tasks, such as parsing, calculations, translation etc.

An Agent Executor is a runnable interface of the Agent and its set of Tools. The agent executor is responsible for calling the agent, getting the action and action input, calling the tool that the action references with the corresponding input, getting the output of the tool, and then passing all that information back into the Agent to get the next action it should take. Usually it is an iterative process until the Agent reaches the Final Answer or output.

In this article, I will show you how we can use LangChain Agent and Azure OpenAI gpt-35-turbo model to query your SQL database using natural language (without writing any SQL at all!) and get useful data insights. We will use SQL Database Toolkit and Agent which can convert user input into appropriate SQL query and run it in Database to get an answer.

This is an exploratory article. It aims to provide an overview of the currently available tools and identify any challenge during the process.

image generated using Bing image creator — Image by author (created using Bing Image creator)

Scope of Requirement

For this exploration, we will only read data from the DB and avoid any insert, update or delete operations. This is to preserve the data integrity in the DB. We will focus on how we can answer questions using the data available in the DB.

However the SQL Agent does not guarantee that it will not perform any DML operations on your database based on specific questions. One way to ensure that any accidental DML operation does not happen is to create a database user with only read access and use it in following code.

Let’s take an e-retail company’s order and inventory system database for example. The inventory keeps track of products across multiple categories e.g. kitchen, gardening, stationary, bath etc. The order system records purchase history including order status, delivery date etc. for each product.

Following can be some of the questions from the end users of this application:

Quantity of kitchen products were sold last month.
How many orders have not been shipped yet?
How many orders were delivered late last month?
What are the top 3 products sold last month?

Set up

Python>=3.8 and an IDE for our exploration. I use VS Code.
I use Azure OpenAI gpt-35-turbo as LLM here. This model is part of GPT-3.5 family, which can understand and generate natural language and code. To follow along, you need an Azure subscription with OpenAI service enabled. Know more here.
I use an Azure SQL database here. However you can use an on-prem SQL DB too.

Database

I created a database named as retailshopdb with following Tables and relationships:

Category
Product
Orders

Along with ‘Id’ columns being the primary keys for each table, the tables have foreign key relationships with each other e.g. CategoryId is a foreign key in Product table, and ProductId is a foreign key in Orders table. These relationships are crucial for the LangChain agent to construct the SQL query as per end user’s question.

Azure OpenAI

If you have an Azure OpenAI resource created in your subscription, navigate to Azure OpenAI studio. Create a deployment for gpt-35-turbo model.

Code & Output Analysis

Let’s start with some base code to access the LLM from in VS code python notebook.

Initialize the notebook by installing required libraries and setting required environment variables.

%pip install langchain openai sqlalchemy

import os
from dotenv import load_dotenv

os.environ["OPENAI_API_TYPE"]="azure"
os.environ["OPENAI_API_VERSION"]="2023-07-01-preview"
os.environ["OPENAI_API_BASE"]="" # Your Azure OpenAI resource endpoint
os.environ["OPENAI_API_KEY"]="" # Your Azure OpenAI resource key
os.environ["OPENAI_CHAT_MODEL"]="gpt-35-turbo-16k" # Use name of deployment

os.environ["SQL_SERVER"]="" # Your az SQL server name
os.environ["SQL_DB"]="retailshop"
os.environ["SQL_USERNAME"]="" # SQL server username 
os.environ["SQL_PWD"]="{<password>}" # SQL server password

2. Connect to the DB.

from sqlalchemy import create_engine

driver = '{ODBC Driver 17 for SQL Server}'
odbc_str = 'mssql+pyodbc:///?odbc_connect=' \
                'Driver='+driver+ \
                ';Server=tcp:' + os.getenv("SQL_SERVER")+'.database.windows.net;PORT=1433' + \
                ';DATABASE=' + os.getenv("SQL_DB") + \
                ';Uid=' + os.getenv("SQL_USERNAME")+ \
                ';Pwd=' + os.getenv("SQL_PWD") + \
                ';Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;'

db_engine = create_engine(odbc_str)

3. Initialize LangChain chat_model instance which provides an interface to invoke a LLM provider using chat API. The reason to select chat model is the gpt-35-turbo model is optimized for chat, hence we use AzureChatOpenAI class here to initialize the instance.

from langchain.chat_models import AzureChatOpenAI

llm = AzureChatOpenAI(model=os.getenv("OPENAI_CHAT_MODEL"),
                      deployment_name=os.getenv("OPENAI_CHAT_MODEL"),
                      temperature=0)

Note that temperature is set as 0. Temperature is a parameter that controls the “creativity” or randomness of the text generated. A lower temperature (0 being the lowest) makes the output more “focused” or deterministic. Since we’re dealing with Database here, it’s important that LLM generates factual response.

4. Create a Prompt Template.

Prompt is the input that we send to the LLM to generate an output. Prompt can also be designed to contain instructions, context, examples (one shot or few shot) which can be crucial for generating accurate output, as well as setting the tone and formatting your output data.

Using Prompt Template is a good way to structure these properties including the end user’s input to be provided to the LLM. We use LangChain’s ChatPromptTemplate module here, which is based on ChatML (Chat Markup Language).

This is a base prompt template to start with. Over time, we will update this template as needed —

from langchain.prompts.chat import ChatPromptTemplate

final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", 
         """
          You are a helpful AI assistant expert in querying SQL Database to find answers to user's question about Categories, Products and Orders.
         """
         ),
        ("user", "{question}\n ai: "),
    ]
)

Now initialize the create_sql_agent which is designed to interact with SQL Database as below. The agent is equipped with toolkit to connect to your SQL database and read both the metadata and content of the tables.

from langchain.agents import AgentType, create_sql_agent
from langchain.sql_database import SQLDatabase
from langchain.agents.agent_toolkits.sql.toolkit import SQLDatabaseToolkit

db = SQLDatabase(db_engine)

sql_toolkit = SQLDatabaseToolkit(db=db, llm=llm)
sql_toolkit.get_tools()

sqldb_agent = create_sql_agent(
    llm=llm,
    toolkit=sql_toolkit,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Note that we use ZERO_SHOT_REACT_DESCRIPTION here as the value of agent_type parameter, which instructs that the agent does not use memory.

All set to run our testing —

sqldb_agent.run(final_prompt.format(
        question="Quantity of kitchen products sold last month?"
  ))

Notice in the following cell output how the LangChain Agent Executor is using the flow of Action , Observation and Thought in an iterative manner, until it reaches a Final Answer.

Output: 10

This is a correct answer.

Let’s have some fun, shall we? Replace ‘Quantity’ with ‘how many’ in the same question, which should produce the same answer.

sqldb_agent.run(final_prompt.format(
        question="How many kitchen products were sold last month?"
  ))

But this is what we get —

Output: ‘2 kitchen products were sold in the current month.’

This output is not correct! The Agent makes a mistake in the SQL query creation. Instead of doing SUM(ProductOrderedQuantity) to get the output, it does a COUNT(*) on the JOIN result which gives the wrong output.

Why changing the prompt input slightly produces different outputs?
OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain due to GPU floating point math.

Running another test with a different input—

sqldb_agent.run(final_prompt.format(
        question="How many orders have not been shipped yet?"
  ))

Output: ‘There are 15 orders that have not been shipped yet.’

This is again incorrect result. The agent takes into account ‘Completed’ orders as well, when our question implies only the orders which have not been shipped.

Let’s see what modifications we can make to produce accurate outputs.

Play with Prompt Engineering

The LangChain agent can read table metadata from SQL Database using its Toolkit, and to some extent it can interpret the column names as well. But there are still some gap in reasoning, which we can try to mitigate using Prompt Engineering technique.

We started with a base prompt template having a single line of instruction. Let’s include some additional information to provide more context about our use case to the LLM in order to form better SQL query. Here are the information I added in the System message on a high-level:

Information about Table columns
‘Meaning’ of Order Status values
Most specific information at the end

from langchain.prompts.chat import ChatPromptTemplate


final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", 
         """
         You are a helpful AI assistant expert in identifying the relevant topic from user's question about Categories, Products and Orders and then querying SQL Database to find answer.
         Use following context to create the SQL query. Context:
         Product table contains information about products including product name, description, product price and product category.
         Category table contains information about categories including category name and description. Each Product is mapped to a Category.
         Orders table contains information about orders placed by customers including 
         quantity or number of products ordered,
         expected delivery date and actual delivery date of the Order in the location and the status of the order. 
         Order status = 'Processing' means the order is being processed by seller and not yet shipped,
         Order status = 'Shipped' means the order is shipped by the seller and is on the way to the customer,
         Order status = 'Completed' means the order is delivered to the customer, and 
         Order status = 'Cancelled' means the order is cancelled by the customer.

         If the question is about number of products in an order, then look for column names with 'quantity' in the tables and use 'sum' function to find the total number of products.
        """
         ),
        ("user", "{question}\n ai: "),
    ]
)

Now run the first input again—

sqldb_agent.run(final_prompt.format(
        question ="How many Kitchen products were sold in current month?"
  ))

Output: 10

The reasoning has improved! By providing additional context to the LLM, we were able to get an accurate output.

Now test all user inputs —

After-thoughts . . .

All this looks good for play, but what if we want to actually build a solution and release it for end users consumption? This is a great idea for use cases like chatbot on your own database, however any like any typical software development, we also need to think and decide about some crucial design aspects before building LLM-based systems.

Scalability

In this example, I have used 3 tables with total ~30 rows. The avg. latency to produce an output which involves joining all 3 tables is ~5 secs. As of today, I didn’t find any information in official docs on the maximum size of database we can use with this Agent. However there are few parameters we can think of to determine our requirements:

What is the Latency required by your application? If you’re building a chatbot, then your expected latency might not go beyond a certain number.
What is your Database size? Also take into account the size of individual tables which you want to use for query.

Note that you don’t need to pass entire Database to the Agent Toolkit. There is option to select specific tables to work with the Toolkit. A good option is to identify subsets of tables for separate use cases, and create multiple Agents pointing to different subset of tables.

3. Rate and quota limit of your Azure OpenAI resource. If you’re using another LLM provider, then look for limitations/restrictions there too.

Reliability

How can we ensure that we get accurate response consistently? How do we make sure the system does not hallucinate or produce completely unexpected content?

There is ongoing research on improving reliability and robustness of LLMs. Using use case specific prompts we can help improve the reasoning for our use case, which is also sometimes termed as ‘temporary or in-context learning’.

Remember that we are not training LLMs here. From a perspective of building products using the pre-trained LLMs, we can only tweak our code, model parameters and prompts that are built on top of these LLMs, but tweak them in a targeted way.

Development in an iterative manner and also evaluation along the way can take us to the right direction to develop an overall working system.

Logging and Monitoring

Like any other typical software development, enabling logging and continuous monitoring of the LLM-based applications is a good practice. Monitoring can not only capture system-related metrics like performance, latency, request-response rates, but also the inputs and outputs from our system, which can help us determining the consistency of the system. Some useful information we can gather from Monitoring and use to improve our system:

Frequent, similar questions from end user, and outputs generated for these questions
Hallucination rate of LLM

Conclusion

The Software Engineering world is changing rapidly with LLMs’ immense generative capability, and there are many solutions coming up in this space. We have an opportunity to adopt this technology, and harness its power to create products while also keeping a check on reliability of LLM-backed systems. It is always a good idea to start small, build a proof-of-concept app and see if it fits your requirement.