The world’s leading publication for data science, AI, and ML professionals.

Is it Time to Start Talking About Prompt Architecture in LLMs?

From prompt engineering to prompt architecture

Image by the author. (AI generated)
Image by the author. (AI generated)

Summarize.

It started with a single word. Not happy with the results, we tried again.

Summarize the most important points of the article.

Prompt engineering teaches us that more specific prompts are better.

Identify the three most important arguments made in the article and evaluate the strength of the author’s argument based on the evidence provided. Are there any points where you feel the argument could be stronger or more convincing?

Over time, we learned to include more details to guide our favorite LLMs to provide the best answers.

A recent prompt architecture called Least to Most prompting. [1]
A recent prompt architecture called Least to Most prompting. [1]

Prompt engineering techniques are becoming more complex and elaborate systems, sometimes made of many components. The definition of prompt engineering might be limiting in defining such intricate systems.

In this article, I want to propose a more accurate label for multi-component systems that interface with LLMs:

Prompt Architecture.

The history of prompt engineering

Modern language models have developed an impressive capacity to take on novel tasks after seeing only a couple of examples. This ability is called in-context learning, and it’s the main reason why we prompt engineering works so well.

Researchers think in-context learning works because pretraining teaches the model the general skills needed for language tasks. Then, at test time, it just has to recognize the pattern and apply its skills. Bigger models do this even better, making them surprisingly adaptable at various natural language tasks. [2]

In the past, you’d need thousands of labeled examples to fine-tune a language model for a new task. But with in-context learning, you can give the model the task description in its context window, and it can figure out the new task. We call this zero-shot learning.

In few-shot prompting, some examples of the desired output are added to the prompt. Image by author.
In few-shot prompting, some examples of the desired output are added to the prompt. Image by author.

Few-shot learning works by providing a few examples to the model in its context. The model then adapts at test time to recognize and continue the pattern. No weight updates are needed. This rapid adaptation ability improves as models grow in size, allowing large models like GPT-3 to learn new tasks from just a few examples. The model can generalize pretty well after seeing just a handful of examples.

These techniques offer a general framework for interacting with LLMs more thoughtfully.

Role prompting is a prompt engineering technique that I’m sure you’ve tried at least once in your ChatGPT experience, used for more specific tasks. Here, the AI system is assigned a specific role at the start of the prompt. This additional information provides context that can enhance the model’s comprehension and lead to more effective responses. [3]

The prompt is initiated with a directive designating the AI’s role, and then continues with a question or task that the AI should respond to within the framework of the assigned role.

Role prompting. Image by author. Prompt source.
Role prompting. Image by author. Prompt source.

Providing context through role prompting assists the AI in understanding and answering appropriately. It guides the model to respond as expected from someone with expertise in a particular domain. For example, you ask prompt the model to be a doctor to get a more medically relevant response.

One of my favorite prompts. Image by author. Prompt source.
One of my favorite prompts. Image by author. Prompt source.

Enabling complex reasoning in Large Language Models

LLMs still struggle with logical reasoning and multi-step problem-solving. Chain of Thought prompting is a technique to get these models to show their work and reason through problems step-by-step.

Demonstrating the desired reasoning process prompts the model to replicate that logical thought process on novel problems. CoT improves performance on multi-step reasoning tasks like math and logic puzzles that usually confuse these models.

Chain of Thought prompting. [4]
Chain of Thought prompting. [4]

Recent work has advanced prompt engineering into systems with multiple elements and inference phases.

Here is where we cross the line between prompt engineering and prompt architecture.

Self-consistency prompt architecture. [5]
Self-consistency prompt architecture. [5]

Complex reasoning tasks typically admit multiple valid reasoning paths that lead to the correct solution.

Self-consistency prompting first samples a diverse set of candidate outputs from the model, generating multiple possible reasoning paths. It then aggregates the answers and chooses the most common answer among the final answer set. If different reasoning paths lead to the same definitive answer, there is greater confidence that the answer is correct. [5]

Before answering a physics question, the model asks itself about physics principles.[6]
Before answering a physics question, the model asks itself about physics principles.[6]

Step-back prompting further develops the idea of solving a problem by decomposing it in intermediate steps. It’s a prompt architecture that improves reasoning capabilities by having the model take a step back to formulate an abstract version of the question before attempting to answer it.

Step-back prompting first asks the LLM a more general question about key ideas. The LLM answers with core facts and concepts. With this broad knowledge, the LLM then uses the specific original question to give the final response. Tests across benchmarks show stepping back helps large models reason better and make fewer mistakes. [6]

Chain of Verification prompt architecture. [7]
Chain of Verification prompt architecture. [7]

Chain of Verification (CoVe) is a prompt architecture that seeks to reduce hallucinations in LLMs. CoVe first has the model generate an initial response to a query, which may contain inaccuracies or hallucinations. Next, we prompt the LLM to create a set of verification questions to fact-check potential errors in its initial response. The LLM then answers these verification questions independently, without conditioning on the original response to avoid repeating hallucinations. The goal is to generate a revised, verified response, incorporating the verification question-answer pairs to correct any inconsistencies with the initial response. [7]

Prompt architectures for autonomous agents and advanced applications

Prompt architecture also enables complex applications that are impossible with engineering a single prompt.

ReAct prompt architecture. [8]
ReAct prompt architecture. [8]

In ReAct prompting, the model mixes thoughts and actions to solve complex tasks. Thoughts are plans and reason steps that resemble human reasoning. Actions gather information externally through APIs or environments. Observations return relevant information. ReAct also increases model interpretability by exposing thought processes to assess reasoning correctness. Humans can also edit thoughts to control model behavior. [8]

Choosing the Right Prompt Architecture

For a conversational chatbot, more straightforward prompt engineering techniques are often sufficient as a first try. If those fail, you can escalate to methods like Step Back or Self-Consistency prompting to improve reasoning without adding too much complexity.

Look into CoVe if building an application where reducing hallucinations is your priority, or even more advanced methods like Zero Resource Hallucination Prevention. [9] However, CoVe a multi-step interaction that might be tedious for a chat. In this case, Retrieval Augmented Generation (RAG) may be a better option to reduce hallucinations. [10]

High-level overview of RAG. Image by the author.
High-level overview of RAG. Image by the author.

These methods reveal exciting insights about LLMs, but RAG better suits real applications.

Remember that building an application with advanced prompt architectures will be more expensive since each query will use more tokens to generate the final response.

If you aim to build autonomous agents – intelligent, goal-oriented systems – try ReAct prompting. React allows the LLM to interact with the world by mixing thoughts and actions.

Models are growing more capable of solving complex tasks without help. Prompts will become even more complex, enabling more advanced use cases for Large Language Models.

Experience lets you develop intuition for which techniques work best in different situations.

The future of interacting with LLMs

Advanced prompt architectures elevate LLMs to perform tasks impossible with just one inference.

Prompt architecture is also an exciting way to understand what’s inside LLMs beyond improving their practical use. Some are too complicated or expensive for practical, real-world applications anyway.

Prompt architecture lets us peek inside the black box of LLMs.

Prompt architecture is not an evolution of Prompt Engineeringit’s a radically different technique.

While prompt engineering uses a single inference step that anyone can perform in a chat interface, prompt architecture requires multiple inferences and logical steps that often need complex code to be implemented.

Looking into both reveals new capabilities for large language models.

In my view, the distinction is necessary.


If you enjoyed this article, join Text Generation – our newsletter has two weekly posts with the latest insights on Generative AI and Large Language Models.

Also, you can find me on LinkedIn.


References

[1] [2205.10625v3] Least-to-Most Prompting Enables Complex Reasoning in Large Language Models (arxiv.org)

[2] [2205.11916] Large Language Models are Zero-Shot Reasoners (arxiv.org)

[3] [2308.07702] Better Zero-Shot Reasoning with Role-Play Prompting (arxiv.org)

[4] [2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (arxiv.org)

[5] [2203.11171] Self-Consistency Improves Chain of Thought Reasoning in Language Models (arxiv.org)

[6] [2310.06117] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models (arxiv.org)

[7] [2309.11495] Chain-of-Verification Reduces Hallucination in Large Language Models (arxiv.org)

[8] [2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models (arxiv.org)

[9] [2309.02654] Zero-Resource Hallucination Prevention for Large Language Models (arxiv.org)

[10] What is retrieval-augmented generation? | IBM Research Blog


Related Articles