Async for LangChain and LLMs

In this article, I will cover how to use Asynchronous calls to LLMs for long workflows using LangChain. We will go through an example with the full code and compare Sequential execution with the Async calls.
Here is the overview of the content. If you’d like you can jump to the section of your interest:
- Basics: What is LangChain
- How to run a Synchronous chain with LangChain
- How to run a single Asynchronous chain with LangChain
- Real-world tips for long workflows with Async Chains.
So let’s start!
Basics: What is Langchain
LangChain is a framework for developing applications powered by language models. That is the official definition of LangChain. This framework was created recently and is already used as the industry standard for building tools powered by LLMs.
It is open-source and well-maintained, with new features being released in a very fast time frame.
The official documentation can be found [here](https://github.com/hwchase17/langchain) and the GitHub repository here.
One downside that we have in this library is that since the features are new we cannot use Chat Gpt to help effectively to build new code. So this means that we have to work in the "Ancient" way of reading documentation, forums, and tutorials.
The documentation for LangChain.is really good however there are not a lot of examples of some specific things.
I ran into this problem with Async for long chains.
Here are the main resources I used to learn more about the framework:
- Deep Learning AI course: LangChain Chat with your data;
- Official Documentation;
- Youtube channel.
(ps. They are all free)
How to run a Synchronous chain with LangChain
So let me set up the problem I had: I have a data frame with a lot of rows and for each of those rows I need to run multiple prompts (chains) to an LLM and return the result to my data frame.
When you have multiple rows, let’s say 10K, running 3 prompts for each and each response (if the server is not overloaded) taking about 3–5 seconds you end up waiting for days for the workflow to be completed.
Bellow I am going to show the main steps and code to build a synchronous chain and time it on a subset of data.
For this example, I am going to use the dataset Wine Reviews, license. The goal here is to extract some information from the written reviews.
I want to extract a Summary of the review, the main sentiment, and the top 5 characteristics of each wine.
For that, I created two chains, one for the summary and sentiment and another that takes the summary as input to extract the characteristics.
Here is the code to run it:
Run time (10 examples):
Summary Chain (Sequential) executed in 22.59 seconds. Characteristics Chain (Sequential) executed in 22.85 seconds.
If you want to understand more about the components I am using I really recommend watching the Deep Learning AI Course.
The main takeaways from this code are the building blocks for a chain, how to run it in a sequential way, and the time it took to finish this loop. It is important to remember that it was about 45 seconds for 10 examples and the full dataset contains 130K rows. So the Async implementation is the New Hope to run this in a reasonable time.
So with the problem set up and the baseline established, let’s see how we can optimize this code to run much faster.
How to run a single Asynchronous chain with LangChain
So for this, we are going to use a resource called Asynchronous calls. To explain this, first I will explain briefly what the code is doing and where the time is taking too long.
In our example, we go through each row of the data frame, extract some information from the rows, add them to our prompt, and call the GPT API to get a response. After the response, we just parse it and add it back to the data frame.

The main bottleneck here is when we call the GPT API because our computer has to wait idly for the response from that API (about 3 seconds). The rest of the steps are fast and can still be optimized but that is not the focus of this article.
So instead of waiting Idly for the response, what if we sent all the calls to the API at the same time? This way we would only have to wait for a single response and then process them. This is called Asynchronous calls to the API.

This way we do the pre-process and post-process sequentially but the calls to the API do not have to wait for the last response to come back before sending the next one.
So here is the code for the Async chains:
In this code, we use the Python syntax of async and await. Langchain also gives us the code to run the chain async, with the arun() function. So in the beginning we first process each row sequentially (can be optimized) and create multiple "tasks" that will await the response from the API in parallel and then we process the response to the final desired format sequentially (can also be optimized).
Run time (10 examples):
Summary Chain (Async) executed in 3.35 seconds. Characteristics Chain (Async) executed in 2.49 seconds.
Compared to the sequential:
Summary Chain (Sequential) executed in 22.59 seconds. Characteristics Chain (Sequential) executed in 22.85 seconds.
We can see almost a 10x improvement in the run time. So for big workloads, I highly recommend using this method. Also my code is full of for loops that can also be optimized further to improve performance.
The full code to this tutorial can be found in this Github Repo.
Real-world tips for long workflows with Async Chains.
When I had to run this, I ran into some limitations and a few roadblocks, that I want to share with you.
Notebooks are not Async Friendly
When running async calls on Jupyter Notebooks you may encounter some issues. However, just ask Chat GPT and it can probably help you out with that. The code I built is to run big workloads in a .py file, so it may need some changes to run in a notebook.
Too many output keys
The First one was that my chain had multiple keys as outputs and at the time the arun() only accepted chains that had one key as the output. So to fix this I had to break my chain into two separate ones.
Not all chains can be async
I had a logic of using a vector database for examples and comparisons in my prompt and that required that the examples were sequentially compared and added to the database. This rendered unfeasible the use of async for this link in the full chain.
Lack of content
For this specific matter, the best content I could find was the official documentation for async and build from there to my use case. So if you run it and find new things out share it with the world!
Conclusion
LangChain is a very powerful tool to create Llm-based applications. I highly recommend learning this framework and doing the courses cited above.
For the specific topic of running chains, for high workloads we saw the potential improvement that Async calls have, so my recommendation is to take the time to understand what the code is doing and have a boilerplate class (such as the one provided in my code) and run it Asynchronously!
For small workloads or applications that require only one call to an API it is not necessary to do it async, but if you have a boilerplate class just add a sync function so you can easily use one or the other.
Thanks for reading.
The full code can be found here.
If you like the content and want to support me, you can buy me a coffee:
Gabriel Cassimiro is a Data Scientist sharing free content to the community
Here are a few other articles you might be interested in: