Currently, building applications using large language models (LLMs) can be not only complex but also fragile. Typical pipelines are often implemented using prompts, which are hand-crafted through trial and error because LLMs are sensitive to how they are prompted. Thus, when you change a piece in your pipeline, such as the LLM or your data, you will likely weaken its performance – unless you adapt the prompt (or fine-tuning steps).
When you change a piece in your pipeline, such as the LLM or your data, you will likely weaken its performance…
DSPy [1] is a framework that aims to solve the fragility problem in language model (LM)-based applications by prioritizing programming over prompting. It allows you to recompile the entire pipeline to optimize it to your specific task – instead of repeating manual rounds of prompt engineering – whenever you change a component.
Although the paper [1] on the framework was already published in October 2023, I only recently learned about it. After just watching one video ("DSPy Explained!" by Connor Shorten), I could already understand why the developer community is so excited about DSPy!
This article gives a brief introduction to the DSPy framework by covering the following topics:
- What is DSPy (including discussion about DSPy vs. LangChain vs. LlamaIndex and DSPy vs. PyTorch)
- DSPy Programming Model: Signatures, Modules, and Teleprompters
- DSPy Compiler
- DSPy Example: Naive RAG Pipeline
What is DSPy
DSPy ("Declarative Self-improving Language Programs (in Python)", pronounced "dee-es-pie") [1] is a framework for "programming with foundation models" developed by researchers at Stanford NLP. It emphasizes programming over prompting and moves building LM-based pipelines away from manipulating prompts and closer to programming. Thus, it aims to solve the fragility problem in building LM-based applications.
DSPy also provides a more systematic approach to building LM-based applications by separating the information flow of your program from the parameters (prompts and LM weights) of each step. DSPy will then take your program and automatically optimize how to prompt (or finetune) LMs for your particular task.
For this purpose, DSPy introduces a set of the following concepts:
- Hand-written prompts and fine-tuning are abstracted and replaced by signatures
- Prompting techniques, such as Chain of Thought or ReAct, are abstracted and replaced by modules
- Manual prompt engineering is automated with optimizers (teleprompters) and a DSPy Compiler
The workflow of building an LM-based application with DSPy, as discussed in the DSPy Intro Notebook, is shown below. It will remind you of the workflow for training a neural network:

- Collect dataset: Collect a few examples of the inputs and outputs of your program (e.g., question and answer pairs), which will be used to optimize your pipeline.
- Write DSPy program: Define your program’s logic with signatures and modules and the information flow among the components to solve your task.
- Define validation logic: Define a logic to optimize your program for using a validation metric and an optimizer (teleprompter).
- Compile DSPy program: The DSPy compiler takes the training data, program, optimizer, and validation metric into account to optimize your program (e.g., prompts or finetunes).
- Iterate: Repeat the process by improving your data, program, or validation until you are happy with your pipeline’s performance.
Here is a short list of all the important links related to DSPy:
- DSPy Paper: DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines [1]
- DSPy GitHub: https://github.com/stanfordnlp/dspy
- Stay up to date on DSPy by following Omar Khattab
How is DSPy different from LangChain or LlamaIndex?
LangChain, LlamaIndex, and DSPy are all frameworks that help developers build LM-based applications effortlessly. Typical pipelines using LangChain and LlamaIndex are often implemented using prompt templates, which make the entire pipeline sensitive to component changes. In contrast, DSPy moves building LM-based pipelines away from manipulating prompts and closer to programming.
The newly introduced compiler in DSPy eliminates any additional prompt engineering or fine-tuning efforts when changing parts in your LM-based applications, such as the LM or data. Instead, developers can simply re-compile the program to optimize the pipeline to the newly added changes. Thus, DSPy helps developers obtain the performance of their pipeline with less effort than LangChain or LlamaIndex.
Although LangChain and LlamaIndex are already widely adopted in the developer community, DSPy has already sparked considerable interest in the same community as a new alternative.
How is DSPy related to PyTorch?
If you have a background in Data Science, you will quickly notice a syntax similarity to PyTorch when you start using DSPy. The authors of the DSPy paper [1] explicitly state PyTorch as a source of inspiration.
Similarly to PyTorch, where general-purpose layers can be composed in any model architecture, in DSPy general-purpose modules can be composed in any LM-based application. Additionally, compiling a DSPy program, where the parameters in the DSPy modules are automatically optimized, is similar to training a neural network in PyTorch, where the model weights are trained using optimizers.
The following table summarizes the analogies between PyTorch and DSPy:

DSPy Programming Model
This section discusses the following three core concepts introduced by the DSPy Programming model:
- Signatures: Abstracting prompting and fine-tuning
- Modules: Abstracting prompting techniques
- Teleprompters: Automating prompting for arbitrary pipelines
Signatures: Abstracting prompting and fine-tuning
Every call to the LM in a DSPy program must have a natural language signature, which replaces the traditional hand-written prompt. A signature is a short function that specifies what a transformation does rather than how to prompt the LM to do it (e.g., "consume questions and context and return answers").

A signature is a tuple of input and output fields in its minimal form.

Below, you can see a few examples of shorthand syntax signatures.
"question -> answer"
"long-document -> summary"
"context, question -> answer"
In many cases, these shorthand syntax signatures are sufficient. However, in cases where you need more control, you also define signatures with the following notation. In this case, a signature consists of three elements:
- A minimal description of the sub-task the LM is supposed to solve,
- a description of the input fields and
- a description of the output fields.
Below, you can see the complete notation for the signature context, question -> answer
:
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")
In contrast to hand-written prompts, signatures can be compiled into self-improving and pipeline-adaptive prompts or finetunes by bootstrapping examples for each signature.
Modules: Abstracting prompting techniques
You are probably familiar with a few different prompting techniques, such as adding sentences like "Your task is to ..."
or "You are a ..."
at the beginning of a prompt, Chain of Thought ("Let's think step by step"
), or adding sentences like "Don't make anything up"
or "Only use the provided context"
at the end of the prompt.
Modules in DSPy are templated and parameterized to abstract these prompting techniques. This means that they are used to adapt DSPy signatures to a task by applying prompting, fine-tuning, augmentation, and reasoning techniques.
Below, you can see how a signature can be passed to a ChainOfThought
module and then called with values for the input fields context
and question
.
# Option 1: Pass minimal signature to ChainOfThought module
generate_answer = dspy.ChainOfThought("context, question -> answer")
# Option 2: Or pass full notation signature to ChainOfThought module
generate_answer = dspy.ChainOfThought(GenerateAnswer)
# Call the module on a particular input.
pred = generate_answer(context = "Which meant learning Lisp, since in those days Lisp was regarded as the language of AI.",
question = "What programming language did the author learn in college?")
Below, you can see how the ChainOfThought
module initially implements the signature "context, question -> answer"
. If you want to try it yourself, you can use lm.inspect_history(n=1)
to print the last prompt.

At the time of writing, DSPy implements the following six modules:
[dspy.Predict](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspypredict)
: Processes the input and output fields, generates instructions, and creates a template for the specifiedsignature
.[dspy.ChainOfThought](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspychainofthought)
: Inherits from thePredict
module and adds functionality for "Chain of Thought" processing.[dspy.ChainOfThoughtWithHint](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspychainofthoughtwithhint)
: Inherits from thePredict
module and enhances theChainOfThought
module with the option to provide hints for reasoning.[dspy.MultiChainComparison](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspymultichaincomparison)
: Inherits from thePredict
module and adds functionality for multiple chain comparisons.[dspy.Retrieve](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspyretrieve)
: Retrieves passages from a retriever module.[dspy.ReAct](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspyreact)
: Designed to compose the interleaved steps of Thought, Action, and Observation.
You can chain these modules together in classes that are inherited from dspy.Module
and take two methods. You might already notice a syntactic similarity to PyTorch:
__init__()
: Declares the used submodules.forward()
: Describes the control flow among the defined sub-modules.
class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
def forward(self, question):
context = self.retrieve(question).passages
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=prediction.answer)
The above piece of code creates the following information flow among the defined modules in the RAG()
class:

Teleprompters: Automating prompting for arbitrary pipelines
Teleprompters act as optimizers for DSPy programs. They take a metric and, together with the DSPy compiler, learn to bootstrap and select effective prompts for a DSPy program’s modules.
from dspy.teleprompt import BootstrapFewShot
# Simple teleprompter example
teleprompter = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)
At the time of writing, DSPy implements the following five teleprompters:
[dspy.LabeledFewShot](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptlabeledfewshot)
: definingk
number of samples to be used by the predictor.[dspy.BootstrapFewShot](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptbootstrapfewshot)
: Bootstrapping[dspy.BootstrapFewShotWithRandomSearch](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptbootstrapfewshotwithrandomsearch)
: Inherits from theBootstrapFewShot
teleprompter and introduces additional attributes for the random search process.[dspy.BootstrapFinetune](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptbootstrapfinetune)
: t defines the teleprompter as aBootstrapFewShot
instance for the finetuning compilation.[dspy.Ensemble](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptensemble)
: Creates ensembled versions of multiple programs, reducing various outputs from different programs into a single output.
There are also the [SignatureOptimizer](https://github.com/stanfordnlp/dspy/blob/main/docs-page/docs/deep-dive/teleprompter/signature-optimizer.mdx)
and BayesianSignatureOptimizer
, which improve the output prefixes and instruction of the signatures in a module in a zero/few-shot setting.
Different teleprompters offer various tradeoffs regarding how much they optimize cost versus quality, etc.
DSPy Compiler
The DSPy compiler will internally trace your program and then optimize it using an optimizer (teleprompter) to maximize a given metric (e.g., improve quality or cost) for your task. The optimizations depend on the type of LM you are using:
- for LLMs: construct high-quality few-shot prompts
- for smaller LMs: train automatic finetunes
That means the DSPy compiler automatically maps the modules to high-quality compositions of prompting, finetuning, reasoning, and augmentation. [1] Internally, the compiler simulates various versions of the program on the inputs and bootstraps example traces of each module for self-improvement to optimize the pipeline to your task. This process is similar to the training process of a neural network.
For example, while the initial prompt, the ChainOfThought
module created earlier, may be a good starting point for any LM to understand the task, it probably isn’t the optimal prompt. As you can see in the following image, the DSPy compiler optimizes the initial prompt and thus eliminates the need for manual prompt tuning.

The compiler takes the following inputs, as shown in the code and image below:
- the program,
- the teleprompter, including the defined validation metric, and
- a few training samples.
from dspy.teleprompt import BootstrapFewShot
# Small training set with question and answer pairs
trainset = [dspy.Example(question="What were the two main things the author worked on before college?",
answer="Writing and programming").with_inputs('question'),
dspy.Example(question="What kind of writing did the author do before college?",
answer="Short stories").with_inputs('question'),
...
]
# The teleprompter will bootstrap missing labels: reasoning chains and retrieval contexts
teleprompter = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

DSPy Example: Naive RAG Pipeline
Now that you are familiar with all the fundamental concepts in DSPy, let’s put it together in your first DSPy pipeline.
Retrieval-augmented generation (RAG) is all the rage in the Generative AI space at the moment. If you’ve been following my work, you have seen me build naive and advanced RAG pipelines using both LangChain (see [tutorial here](https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930)) and LlamaIndex (see tutorial here). Thus, it only makes sense to start learning DSPy with a quick, naive RAG pipeline.
For end-to-end pipelines in the form of Jupyter Notebooks, I recommend checking out the Intro Notebook in the DSPy GitHub repository or the Getting Started with RAG in DSPy Notebook by Connor Shorten.
Prerequisites: Installing DSPy
To install the dspy-ai
Python package, you can simply pip
install it.
pip install dspy-ai
Step 1: Setup
First, you need to set up the LM and retrieval model (RM):
- LM: We will use OpenAI‘s
gpt-3.5-turbo
for which you will need an OpenAI API key. To obtain one, you need an OpenAI account and then "Create new secret key" under API keys. - RM: We will use Weaviate, an open source vector database, which we will populate with additional data.
Let’s begin by populating the external database with some example data from the LlamaIndex GitHub repository (MIT license). You can replace this part with your own data.
!mkdir -p 'data'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'
Next, we will split the document into single sentences and ingest it into the database. We will use Weaviate embedded in this article for simplicity, which you can use for free without registering for an API key. Note that it is important to ingest your data with a property called "content"
when using Weaviate.
import weaviate
from weaviate.embedded import EmbeddedOptions
import re
# Connect to Weaviate client in embedded mode
client = weaviate.Client(embedded_options=EmbeddedOptions(),
additional_headers={
"X-OpenAI-Api-Key": "sk-<YOUR-OPENAI-API-KEY>",
}
)
# Create Weaviate schema
# DSPy assumes the collection has a text key 'content'
schema = {
"classes": [
{
"class": "MyExampleIndex",
"vectorizer": "text2vec-openai",
"moduleConfig": {"text2vec-openai": {}},
"properties": [{"name": "content", "dataType": ["text"]}]
}
]
}
client.schema.create(schema)
# Split document into single sentences
chunks = []
with open("./data/paul_graham_essay.txt", 'r', encoding='utf-8') as file:
text = file.read()
sentences = re.split(r'(?<!w.w.)(?<![A-Z][a-z].)(?<=.|?)s', text)
sentences = [sentence.strip() for sentence in sentences if sentence.strip()]
chunks.extend(sentences)
# Populate vector database in batches
client.batch.configure(batch_size=100) # Configure batch
with client.batch as batch: # Initialize a batch process
for i, d in enumerate(chunks): # Batch import data
properties = {
"content": d,
}
batch.add_data_object(
data_object=properties,
class_name="MyExampleIndex"
)
Now, you can configure your LM and RM in the global settings.
import dspy
import openai
from dspy.retrieve.weaviate_rm import WeaviateRM
# Set OpenAI API key
openai.api_key = "sk-<YOUR-OPENAI-API-KEY>"
# Configure LLM
lm = dspy.OpenAI(model="gpt-3.5-turbo")
# Configure Retriever
rm = WeaviateRM("MyExampleIndex",
weaviate_client = client)
# Configure DSPy to use the following language model and retrieval model by default
dspy.settings.configure(lm = lm,
rm = rm)
Step 2: Collect data
Next, we will collect a few training examples (in this case, hand-annotated). In contrast to training a neural network, you will need only a few examples.
# Small training set with question and answer pairs
trainset = [dspy.Example(question="What were the two main things the author worked on before college?",
answer="Writing and programming").with_inputs('question'),
dspy.Example(question="What kind of writing did the author do before college?",
answer="Short stories").with_inputs('question'),
dspy.Example(question="What was the first computer language the author learned?",
answer="Fortran").with_inputs('question'),
dspy.Example(question="What kind of computer did the author's father buy?",
answer="TRS-80").with_inputs('question'),
dspy.Example(question="What was the author's original plan for college?",
answer="Study philosophy").with_inputs('question'),]
Step 3: Write DSPy program
Now, you are ready to write your first DSPy program. It will be a RAG system. First, you need to define a signature context, question -> answer
, as shown in Signatures, called GenerateAnswer
:
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")
After you have defined the signature, you need to compose a custom RAG class that inherits from dspy.Module
. In the __init__()
: method, you declare the relevant modules and in the forward()
method, you describe the information flow among the modules.
class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
def forward(self, question):
context = self.retrieve(question).passages
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=prediction.answer)
Step 4: Compile DSPy program
Finally, you can define the teleprompter and compile the DSPy program. This will update the prompt used in the ChainOfThought
module. For this example, we will use a simple BootstrapFewShot
teleprompter.
from dspy.teleprompt import BootstrapFewShot
# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)
# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)
Now you can call your RAG pipeline, as shown below:
pred = compiled_rag(question = "What programming language did the author learn in college?")
From here, you can evaluate your results and iterate over the process until you are happy with your pipeline’s performance. For detailed instructions on evaluation, I recommend checking out the Intro Notebook in the DSPy GitHub repository or the Getting Started with RAG in DSPy Notebook by Connor Shorten.
Summary
This article briefly introduced the DSPy framework [1], which the Generative AI community is currently excited about. The DSPy framework introduces a set of concepts to move building LM-based applications away from manual prompt engineering to programming.
In DSPy, traditional prompt engineering concepts are replaced with:
- Signatures replace hand-written prompts,
- Modules replace specific prompt engineering techniques, and
- Teleprompters and the DSPy Compiler replace manual iterations of prompt engineering.
After introducing the DSPy concepts, this article walks you through an example of a naive RAG pipeline using an OpenAI language model and a Weaviate vector database as the retriever model.
Enjoyed This Story?
Subscribe for free to get notified when I publish a new story.
Find me on LinkedIn, Twitter, and Kaggle!
Disclaimer
I am a Developer Advocate at Weaviate at the time of this writing.
References
Literature
[1] Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., … & Potts, C. (2023). Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714.
Additional Learning Resources
- Intro Notebook by DSPy
- Video series by Connor Shorten: DSPy Explained!
- Getting Started with RAG in DSPy! by Connor Shorten with related Jupyter Notebook
Images
If not otherwise stated, all images are created by the author.