LLM Output Parsing: Function Calling vs. LangChain

How to consistently parse outputs from LLMs using Open AI API and LangChain function calling: evaluating the methods’ advantages and disadvantages

Gabriel Cassimiro
Towards Data Science

--

Creating tools with LLMs requires multiple components, such as vector databases, chains, agents, document splitters, and many other new tools.

However, one of the most crucial components is the LLM output parsing. If you cannot receive structured responses from your LLM, you will have a hard time working with the generations. This becomes even more evident when we want a single call to the LLM to output more than one piece of information.

Let’s illustrate the problem with a hypothetical scenario:

We want the LLM to output from a single call the ingredients and the steps to make a certain recipe. But we want to have both of these items separately to use in two different parts of our system.

import openai

recipe = 'Fish and chips'
query = f"""What is the recipe for {recipe}?
Return the ingredients list and steps separately."""

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
messages=[{"role": "user", "content": query}])

response_message = response["choices"][0]["message"]
print(response_message['content'])

This returns the following:

Ingredients for fish and chips:
- 1 pound white fish fillets (such as cod or haddock)
- 1 cup all-purpose flour
- 1 teaspoon baking powder
- 1 teaspoon salt
- 1/2 teaspoon black pepper
- 1 cup cold beer
- Vegetable oil, for frying
- 4 large russet potatoes
- Salt, to taste

Steps to make fish and chips:

1. Preheat the oven to 200°C (400°F).
2. Peel the potatoes and cut them into thick, uniform strips. Rinse the potato strips in cold water to remove excess starch. Pat them dry using a clean kitchen towel.
3. In a large pot or deep fryer, heat vegetable oil to 175°C (350°F). Ensure there is enough oil to completely submerge the potatoes and fish.
4. In a mixing bowl, combine the flour, baking powder, salt, and black pepper. Whisk in the cold beer gradually until a smooth batter forms. Set the batter aside.
5. Take the dried potato strips and fry them in batches for about 5-6 minutes or until golden brown. Remove the fries using a slotted spoon and place them on a paper towel-lined dish to drain excess oil. Keep them warm in the preheated oven.
6. Dip each fish fillet into the prepared batter, ensuring it is well coated. Let any excess batter drip off before carefully placing the fillet into the hot oil.
7. Fry the fish fillets for 4-5 minutes on each side or until they turn golden brown and become crispy. Remove them from the oil using a slotted spoon and place them on a paper towel-lined dish to drain excess oil.
8. Season the fish and chips with salt while they are still hot.
9. Serve the fish and chips hot with tartar sauce, malt vinegar, or ketchup as desired.

Enjoy your homemade fish and chips!

This is a huge string and parsing it would be hard because the LLM can return slightly different structures breaking whatever code you write. You could argue that asking in the prompt to always return “Ingredients:” and “Steps:” could resolve and you are not wrong. This could work, however you would still need to process the string manually and be open to eventual variations and hallucinations.

Solution

There are a couple of ways we could solve this problem. One was mentioned above, but there are a couple of tested ways that might be better. In this article, I will show two options:

  1. Open AI Function calling;
  2. LangChain Output Parser.

Open AI Function calling

This is a method that I have been trying and is giving the most consistent results. We use the Function Calling capability of the Open AI API so that the model returns the response as a structured JSON.

This functionality has the objective of providing the LLM the ability to call an external function by providing the inputs as a JSON. The models were fine-tuned to understand when they need to use a given function. An example of this is a function for current weather. If you ask GPT for the current weather, it won’t be able to tell you, but you can provide a function that does this and pass it to GPT so it will know that it can be accessed given some input.

If you want to dive deeper into this functionality here is the announcement from Open AI and here is a great article.

So let’s look in the code at what this would look like given our problem at hand. Let’s break down the code:

functions = [
{
"name": "return_recipe",
"description": "Return the recipe asked",
"parameters": {
"type": "object",
"properties": {
"ingredients": {
"type": "string",
"description": "The ingredients list."
},
"steps": {
"type": "string",
"description": "The recipe steps."
},
},
},
"required": ["ingredients","steps"],
}
]

The first thing we need to do is declare the functions that will be available to the LLM. We have to give it a name and a description so that the model understands when it should use the function. Here we tell it the this function is used to return the recipe asked.

Then we go into the parameters. First, we say that it is of type object and the properties it can use are ingredients and steps. Both of these also have a description and a type to guide the LLM on the output. Finally, we specify which of those properties are required to call the function (this means we could have optional fields that the LLM would judge if it wanted to use them).

Let’s use that now in a call to the LLM:

import openai

recipe = 'Fish and chips'
query = f"What is the recipe for {recipe}? Return the ingredients list and steps separately."


response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
messages=[{"role": "user", "content": query}],
functions=functions,
function_call={'name':'return_recipe'}
)
response_message = response["choices"][0]["message"]

print(response_message)
print(response_message['function_call']['arguments'])

Here we start by creating our query to the API by formatting a base prompt with what could be a variable input (recipe). Then, we declare our API call using “gpt-3.5-turbo-0613”, we pass our query in the messages argument, and now we pass our functions.

There are two arguments regarding our functions. The first one we pass the list of objects in the format shown above with the functions the model has access to. And the second argument “function_call” we specify how the model should use those functions. There are three options:

  1. “Auto” -> the model decides between user response or function calling;
  2. “none” -> the model does not call the function and returns the user response;
  3. {“name”: “my_function_name”} -> specifying a function name forces the model to use it.

You can find the official documentation here.

In our case and for using as output parsing we used the latter:

function_call={'name':'return_recipe'}

So now we can look at our responses. The response we get (after this filter [“choices”][0][“message”]) is:

{
"role": "assistant",
"content": null,
"function_call": {
"name": "return_recipe",
"arguments": "{\n \"ingredients\": \"For the fish:\\n- 1 lb white fish fillets\\n- 1 cup all-purpose flour\\n- 1 tsp baking powder\\n- 1 tsp salt\\n- 1/2 tsp black pepper\\n- 1 cup cold water\\n- Vegetable oil, for frying\\nFor the chips:\\n- 4 large potatoes\\n- Vegetable oil, for frying\\n- Salt, to taste\",\n \"steps\": \"1. Start by preparing the fish. In a shallow dish, combine the flour, baking powder, salt, and black pepper.\\n2. Gradually whisk in the cold water until the batter is smooth.\\n3. Heat vegetable oil in a large frying pan or deep fryer.\\n4. Dip the fish fillets into the batter, coating them evenly.\\n5. Gently place the coated fillets into the hot oil and fry for 4-5 minutes on each side, or until golden brown and crispy.\\n6. Remove the fried fish from the oil and place them on a paper towel-lined plate to drain any excess oil.\\n7. For the chips, peel the potatoes and cut them into thick chips.\\n8. Heat vegetable oil in a deep fryer or large pan.\\n9. Fry the chips in batches until golden and crisp.\\n10. Remove the chips from the oil and place them on a paper towel-lined plate to drain any excess oil.\\n11. Season the chips with salt.\\n12. Serve the fish and chips together, and enjoy!\"\n}"
}
}

If we parse it further into the “function_call” we can see our intended structured response:

{
"ingredients": "For the fish:\n- 1 lb white fish fillets\n- 1 cup all-purpose flour\n- 1 tsp baking powder\n- 1 tsp salt\n- 1/2 tsp black pepper\n- 1 cup cold water\n- Vegetable oil, for frying\nFor the chips:\n- 4 large potatoes\n- Vegetable oil, for frying\n- Salt, to taste",
"steps": "1. Start by preparing the fish. In a shallow dish, combine the flour, baking powder, salt, and black pepper.\n2. Gradually whisk in the cold water until the batter is smooth.\n3. Heat vegetable oil in a large frying pan or deep fryer.\n4. Dip the fish fillets into the batter, coating them evenly.\n5. Gently place the coated fillets into the hot oil and fry for 4-5 minutes on each side, or until golden brown and crispy.\n6. Remove the fried fish from the oil and place them on a paper towel-lined plate to drain any excess oil.\n7. For the chips, peel the potatoes and cut them into thick chips.\n8. Heat vegetable oil in a deep fryer or large pan.\n9. Fry the chips in batches until golden and crisp.\n10. Remove the chips from the oil and place them on a paper towel-lined plate to drain any excess oil.\n11. Season the chips with salt.\n12. Serve the fish and chips together, and enjoy!"
}

Conclusion for function calling

It is possible to use the feature of function calling straight from the Open AI API. This allows us to have a dictionary format response with the same keys every time the LLM is called.

To use it is pretty straightforward, you just have to declare the functions object specifying the name, description, and properties focused on your task but specifying (in the description) that this should be the response of the model. Also, when calling the API we can force the model to use our function, making it even more consistent.

The main downside of this method is that it is not supported by all LLM models and APIs. So if we wanted to use Google PaLM API we would have to use another method.

LangChain Output Parsers

One alternative we have that is model-agnostic is using LangChain.

First, what is LangChain?

LangChain is a framework for developing applications powered by language models.

That is the official definition of LangChain. This framework was created recently and is already used as the industry standard for building tools powered by LLMs.

It has a functionality that is great for our use case called “Output Parsers”. In this module, there are multiple objects that can be created to return and parse different types of formats from LLM calls. It achieves this, by first declaring what the format is and passing it in the prompt to the LLM. Then it uses the object created previously to parse the response.

Let’s break down the code:

from langchain.prompts import ChatPromptTemplate
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from langchain.llms import GooglePalm, OpenAI


ingredients = ResponseSchema(
name="ingredients",
description="The ingredients from recipe, as a unique string.",
)
steps = ResponseSchema(
name="steps",
description="The steps to prepare the recipe, as a unique string.",
)

output_parser = StructuredOutputParser.from_response_schemas(
[ingredients, steps]
)

response_format = output_parser.get_format_instructions()
print(response_format)

prompt = ChatPromptTemplate.from_template("What is the recipe for {recipe}? Return the ingredients list and steps separately. \n {format_instructions}")

The first thing we do here is create our Response Schema that will be the input for our parser. We create one for the ingredients and one for the steps, each containing a name that will be the key of the dictionary and a description that will guide the LLM on the response.

Then we create our StructuredOutputParser from those response schemas. There are multiple ways to do this, with different styles of parsers. Look here to learn more about them.

Lastly, we get our format instructions and define our prompt that will have the recipe name and the format instructions as inputs. The format instructions are these:

"""
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
"ingredients": string // The ingredients from recipe, as a unique string.
"steps": string // The steps to prepare the recipe, as a unique string.
}
"""

Now what we have left is just calling the API. Here I will demonstrate both the Open AI API and with Google PaLM API.

llm_openai = OpenAI()
llm_palm = GooglePalm()

recipe = 'Fish and chips'

formated_prompt = prompt.format(**{"recipe":recipe, "format_instructions":output_parser.get_format_instructions()})

response_palm = llm_palm(formated_prompt)
response_openai = llm_openai(formated_prompt)

print("PaLM:")
print(response_palm)
print(output_parser.parse(response_palm))

print("Open AI:")
print(response_openai)
print(output_parser.parse(response_openai))

As you can see it is really easy to change between models. The whole structure defined before can be used in the exact same way for any models supported by LangChain. We used also the same parser for both models.

This generated the following output:

# PaLM:
{
'ingredients': '''- 1 cup all-purpose flour\n
- 1 teaspoon baking powder\n
- 1/2 teaspoon salt\n
- 1/2 cup cold water\n
- 1 egg\n
- 1 pound white fish fillets, such as cod or haddock\n
- Vegetable oil for frying\n- 1 cup tartar sauce\n
- 1/2 cup malt vinegar\n- Lemon wedges''',
'steps': '''1. In a large bowl, whisk together the flour, baking powder, and salt.\n
2. In a separate bowl, whisk together the egg and water.\n
3. Dip the fish fillets into the egg mixture, then coat them in the flour mixture.\n
4. Heat the oil in a deep fryer or large skillet to 375 degrees F (190 degrees C).\n
5. Fry the fish fillets for 3-5 minutes per side, or until golden brown and cooked through.\n
6. Drain the fish fillets on paper towels.\n
7. Serve the fish fillets immediately with tartar sauce, malt vinegar, and lemon wedges.
'''
}

# Open AI
{
'ingredients': '1 ½ pounds cod fillet, cut into 4 pieces,
2 cups all-purpose flour,
2 teaspoons baking powder,
1 teaspoon salt,
1 teaspoon freshly ground black pepper,
½ teaspoon garlic powder,
1 cup beer (or water),
vegetable oil, for frying,
Tartar sauce, for serving',
'steps': '1. Preheat the oven to 400°F (200°C) and line a baking sheet with parchment paper.
2. In a medium bowl, mix together the flour, baking powder, salt, pepper and garlic powder.
3. Pour in the beer and whisk until a thick batter forms.
4. Dip the cod in the batter, coating it on all sides.
5. Heat about 2 inches (5 cm) of oil in a large pot or skillet over medium-high heat.
6. Fry the cod for 3 to 4 minutes per side, or until golden brown.
7. Transfer the cod to the prepared baking sheet and bake for 5 to 7 minutes.
8. Serve warm with tartar sauce.'
}

Conclusion: LangChain Output parsing

This method is really good as well and has as its main characteristic flexibility. We create a couple of structures such as Response Schema, Output Parser, and Prompt Templates that can be pieced together easily and used with different models. Another good advantage of this is the support for multiple output formats.

The main disadvantage comes from passing the format instructions via the prompt. This allows for random errors and hallucinations. One real example was from this specific case where I had to specify “ as a unique string” in the description of the response schema. If I did not specify this, the model was returning a list of strings with the steps and instructions and this caused an error of parsing in the Output Parser.

Conclusion

There are multiple ways of using an output parser for your LLM-powered application. However, your choice may change depending on the problem at hand. For myself, I like to follow this idea:

I always use an output parser, even if I have only one output from the LLM. This allows me to control and specify my outputs. If I am working with Open AI, Function Calling is my choice because it has the most control and will avoid random errors in a production application. However, if I am using a different LLM or need a different output format, my choice is LangChain, but with a lot of testing on the outputs, in order to craft the prompt with the least mistakes.

--

--