The world’s leading publication for data science, AI, and ML professionals.

Intro to DSPy: Goodbye Prompting, Hello Programming!

How the DSPy framework solves the fragility problem in LLM-based applications by replacing prompting with programming and compiling

DSPy (Image hand-drawn by the author)
DSPy (Image hand-drawn by the author)

Currently, building applications using large language models (LLMs) can be not only complex but also fragile. Typical pipelines are often implemented using prompts, which are hand-crafted through trial and error because LLMs are sensitive to how they are prompted. Thus, when you change a piece in your pipeline, such as the LLM or your data, you will likely weaken its performance – unless you adapt the prompt (or fine-tuning steps).

When you change a piece in your pipeline, such as the LLM or your data, you will likely weaken its performance…

DSPy [1] is a framework that aims to solve the fragility problem in language model (LM)-based applications by prioritizing programming over prompting. It allows you to recompile the entire pipeline to optimize it to your specific task – instead of repeating manual rounds of prompt engineering – whenever you change a component.


Although the paper [1] on the framework was already published in October 2023, I only recently learned about it. After just watching one video ("DSPy Explained!" by Connor Shorten), I could already understand why the developer community is so excited about DSPy!

This article gives a brief introduction to the DSPy framework by covering the following topics:

What is DSPy

DSPy ("Declarative Self-improving Language Programs (in Python)", pronounced "dee-es-pie") [1] is a framework for "programming with foundation models" developed by researchers at Stanford NLP. It emphasizes programming over prompting and moves building LM-based pipelines away from manipulating prompts and closer to programming. Thus, it aims to solve the fragility problem in building LM-based applications.

DSPy also provides a more systematic approach to building LM-based applications by separating the information flow of your program from the parameters (prompts and LM weights) of each step. DSPy will then take your program and automatically optimize how to prompt (or finetune) LMs for your particular task.

For this purpose, DSPy introduces a set of the following concepts:

The workflow of building an LM-based application with DSPy, as discussed in the DSPy Intro Notebook, is shown below. It will remind you of the workflow for training a neural network:

Workflow of building an LLM-based app with DSPy
Workflow of building an LLM-based app with DSPy
  1. Collect dataset: Collect a few examples of the inputs and outputs of your program (e.g., question and answer pairs), which will be used to optimize your pipeline.
  2. Write DSPy program: Define your program’s logic with signatures and modules and the information flow among the components to solve your task.
  3. Define validation logic: Define a logic to optimize your program for using a validation metric and an optimizer (teleprompter).
  4. Compile DSPy program: The DSPy compiler takes the training data, program, optimizer, and validation metric into account to optimize your program (e.g., prompts or finetunes).
  5. Iterate: Repeat the process by improving your data, program, or validation until you are happy with your pipeline’s performance.

Here is a short list of all the important links related to DSPy:

How is DSPy different from LangChain or LlamaIndex?

LangChain, LlamaIndex, and DSPy are all frameworks that help developers build LM-based applications effortlessly. Typical pipelines using LangChain and LlamaIndex are often implemented using prompt templates, which make the entire pipeline sensitive to component changes. In contrast, DSPy moves building LM-based pipelines away from manipulating prompts and closer to programming.

The newly introduced compiler in DSPy eliminates any additional prompt engineering or fine-tuning efforts when changing parts in your LM-based applications, such as the LM or data. Instead, developers can simply re-compile the program to optimize the pipeline to the newly added changes. Thus, DSPy helps developers obtain the performance of their pipeline with less effort than LangChain or LlamaIndex.

Although LangChain and LlamaIndex are already widely adopted in the developer community, DSPy has already sparked considerable interest in the same community as a new alternative.

How is DSPy related to PyTorch?

If you have a background in Data Science, you will quickly notice a syntax similarity to PyTorch when you start using DSPy. The authors of the DSPy paper [1] explicitly state PyTorch as a source of inspiration.

Similarly to PyTorch, where general-purpose layers can be composed in any model architecture, in DSPy general-purpose modules can be composed in any LM-based application. Additionally, compiling a DSPy program, where the parameters in the DSPy modules are automatically optimized, is similar to training a neural network in PyTorch, where the model weights are trained using optimizers.

The following table summarizes the analogies between PyTorch and DSPy:

Comparison: PyTorch vs. DSPy
Comparison: PyTorch vs. DSPy

DSPy Programming Model

This section discusses the following three core concepts introduced by the DSPy Programming model:

Signatures: Abstracting prompting and fine-tuning

Every call to the LM in a DSPy program must have a natural language signature, which replaces the traditional hand-written prompt. A signature is a short function that specifies what a transformation does rather than how to prompt the LM to do it (e.g., "consume questions and context and return answers").

DSPy signatures replace hand-written prompts.
DSPy signatures replace hand-written prompts.

A signature is a tuple of input and output fields in its minimal form.

Structure of a minimal DSPy signature
Structure of a minimal DSPy signature

Below, you can see a few examples of shorthand syntax signatures.

"question -> answer"

"long-document -> summary"

"context, question -> answer"

In many cases, these shorthand syntax signatures are sufficient. However, in cases where you need more control, you also define signatures with the following notation. In this case, a signature consists of three elements:

  • A minimal description of the sub-task the LM is supposed to solve,
  • a description of the input fields and
  • a description of the output fields.

Below, you can see the complete notation for the signature context, question -> answer:

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In contrast to hand-written prompts, signatures can be compiled into self-improving and pipeline-adaptive prompts or finetunes by bootstrapping examples for each signature.

Modules: Abstracting prompting techniques

You are probably familiar with a few different prompting techniques, such as adding sentences like "Your task is to ..." or "You are a ..." at the beginning of a prompt, Chain of Thought ("Let's think step by step"), or adding sentences like "Don't make anything up" or "Only use the provided context" at the end of the prompt.

Modules in DSPy are templated and parameterized to abstract these prompting techniques. This means that they are used to adapt DSPy signatures to a task by applying prompting, fine-tuning, augmentation, and reasoning techniques.

Below, you can see how a signature can be passed to a ChainOfThought module and then called with values for the input fields context and question.

# Option 1: Pass minimal signature to ChainOfThought module
generate_answer = dspy.ChainOfThought("context, question -> answer")

# Option 2: Or pass full notation signature to ChainOfThought module
generate_answer = dspy.ChainOfThought(GenerateAnswer)

# Call the module on a particular input.
pred = generate_answer(context = "Which meant learning Lisp, since in those days Lisp was regarded as the language of AI.",
                       question = "What programming language did the author learn in college?")

Below, you can see how the ChainOfThought module initially implements the signature "context, question -> answer". If you want to try it yourself, you can use lm.inspect_history(n=1) to print the last prompt.

Initial implementation of the signature "context, question -> answer" with a ChainOfThought module
Initial implementation of the signature "context, question -> answer" with a ChainOfThought module

At the time of writing, DSPy implements the following six modules:

  • [dspy.Predict](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspypredict): Processes the input and output fields, generates instructions, and creates a template for the specified signature.
  • [dspy.ChainOfThought](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspychainofthought): Inherits from the Predict module and adds functionality for "Chain of Thought" processing.
  • [dspy.ChainOfThoughtWithHint](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspychainofthoughtwithhint): Inherits from the Predict module and enhances the ChainOfThought module with the option to provide hints for reasoning.
  • [dspy.MultiChainComparison](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspymultichaincomparison): Inherits from the Predict module and adds functionality for multiple chain comparisons.
  • [dspy.Retrieve](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspyretrieve): Retrieves passages from a retriever module.
  • [dspy.ReAct](https://github.com/stanfordnlp/dspy/blob/main/docs/modules.md#dspyreact): Designed to compose the interleaved steps of Thought, Action, and Observation.

You can chain these modules together in classes that are inherited from dspy.Module and take two methods. You might already notice a syntactic similarity to PyTorch:

  • __init__(): Declares the used submodules.
  • forward(): Describes the control flow among the defined sub-modules.
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

The above piece of code creates the following information flow among the defined modules in the RAG() class:

Example code for naive RAG pipeline and resulting information flow among modules.
Example code for naive RAG pipeline and resulting information flow among modules.

Teleprompters: Automating prompting for arbitrary pipelines

Teleprompters act as optimizers for DSPy programs. They take a metric and, together with the DSPy compiler, learn to bootstrap and select effective prompts for a DSPy program’s modules.

from dspy.teleprompt import BootstrapFewShot

# Simple teleprompter example
teleprompter = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)

At the time of writing, DSPy implements the following five teleprompters:

  • [dspy.LabeledFewShot](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptlabeledfewshot): defining k number of samples to be used by the predictor.
  • [dspy.BootstrapFewShot](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptbootstrapfewshot): Bootstrapping
  • [dspy.BootstrapFewShotWithRandomSearch](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptbootstrapfewshotwithrandomsearch): Inherits from the BootstrapFewShot teleprompter and introduces additional attributes for the random search process.
  • [dspy.BootstrapFinetune](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptbootstrapfinetune): t defines the teleprompter as a BootstrapFewShot instance for the finetuning compilation.
  • [dspy.Ensemble](https://github.com/stanfordnlp/dspy/blob/main/docs/teleprompters.md#telepromptensemble): Creates ensembled versions of multiple programs, reducing various outputs from different programs into a single output.

There are also the [SignatureOptimizer](https://github.com/stanfordnlp/dspy/blob/main/docs-page/docs/deep-dive/teleprompter/signature-optimizer.mdx) and BayesianSignatureOptimizer, which improve the output prefixes and instruction of the signatures in a module in a zero/few-shot setting.

Different teleprompters offer various tradeoffs regarding how much they optimize cost versus quality, etc.

DSPy Compiler

The DSPy compiler will internally trace your program and then optimize it using an optimizer (teleprompter) to maximize a given metric (e.g., improve quality or cost) for your task. The optimizations depend on the type of LM you are using:

  • for LLMs: construct high-quality few-shot prompts
  • for smaller LMs: train automatic finetunes

That means the DSPy compiler automatically maps the modules to high-quality compositions of prompting, finetuning, reasoning, and augmentation. [1] Internally, the compiler simulates various versions of the program on the inputs and bootstraps example traces of each module for self-improvement to optimize the pipeline to your task. This process is similar to the training process of a neural network.

For example, while the initial prompt, the ChainOfThought module created earlier, may be a good starting point for any LM to understand the task, it probably isn’t the optimal prompt. As you can see in the following image, the DSPy compiler optimizes the initial prompt and thus eliminates the need for manual prompt tuning.

How the DSPy compiler optimizes the initial prompt (Inspired by Erika's post)
How the DSPy compiler optimizes the initial prompt (Inspired by Erika‘s post)

The compiler takes the following inputs, as shown in the code and image below:

  • the program,
  • the teleprompter, including the defined validation metric, and
  • a few training samples.
from dspy.teleprompt import BootstrapFewShot

# Small training set with question and answer pairs
trainset = [dspy.Example(question="What were the two main things the author worked on before college?", 
                         answer="Writing and programming").with_inputs('question'),
            dspy.Example(question="What kind of writing did the author do before college?", 
                         answer="Short stories").with_inputs('question'),
            ...
            ]

# The teleprompter will bootstrap missing labels: reasoning chains and retrieval contexts
teleprompter = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

DSPy Example: Naive RAG Pipeline

Now that you are familiar with all the fundamental concepts in DSPy, let’s put it together in your first DSPy pipeline.

Retrieval-augmented generation (RAG) is all the rage in the Generative AI space at the moment. If you’ve been following my work, you have seen me build naive and advanced RAG pipelines using both LangChain (see [tutorial here](https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930)) and LlamaIndex (see tutorial here). Thus, it only makes sense to start learning DSPy with a quick, naive RAG pipeline.

For end-to-end pipelines in the form of Jupyter Notebooks, I recommend checking out the Intro Notebook in the DSPy GitHub repository or the Getting Started with RAG in DSPy Notebook by Connor Shorten.

Prerequisites: Installing DSPy

To install the dspy-ai Python package, you can simply pip install it.

pip install dspy-ai

Step 1: Setup

First, you need to set up the LM and retrieval model (RM):

  • LM: We will use OpenAI‘s gpt-3.5-turbo for which you will need an OpenAI API key. To obtain one, you need an OpenAI account and then "Create new secret key" under API keys.
  • RM: We will use Weaviate, an open source vector database, which we will populate with additional data.

Let’s begin by populating the external database with some example data from the LlamaIndex GitHub repository (MIT license). You can replace this part with your own data.

!mkdir -p 'data'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'

Next, we will split the document into single sentences and ingest it into the database. We will use Weaviate embedded in this article for simplicity, which you can use for free without registering for an API key. Note that it is important to ingest your data with a property called "content" when using Weaviate.

import weaviate
from weaviate.embedded import EmbeddedOptions
import re

# Connect to Weaviate client in embedded mode
client = weaviate.Client(embedded_options=EmbeddedOptions(),
                             additional_headers={
                                "X-OpenAI-Api-Key": "sk-<YOUR-OPENAI-API-KEY>",
                             }
                         )

# Create Weaviate schema
# DSPy assumes the collection has a text key 'content'
schema = {
   "classes": [
       {
           "class": "MyExampleIndex",
           "vectorizer": "text2vec-openai",
            "moduleConfig": {"text2vec-openai": {}},
           "properties": [{"name": "content", "dataType": ["text"]}]
       }      
   ]
}

client.schema.create(schema)

# Split document into single sentences
chunks = []
with open("./data/paul_graham_essay.txt", 'r', encoding='utf-8') as file:
    text = file.read()
    sentences = re.split(r'(?<!w.w.)(?<![A-Z][a-z].)(?<=.|?)s', text)
    sentences = [sentence.strip() for sentence in sentences if sentence.strip()]
    chunks.extend(sentences)

# Populate vector database in batches
client.batch.configure(batch_size=100)  # Configure batch

with client.batch as batch:  # Initialize a batch process
    for i, d in enumerate(chunks):  # Batch import data
        properties = {
            "content": d,
        }
        batch.add_data_object(
            data_object=properties,
            class_name="MyExampleIndex"
        )

Now, you can configure your LM and RM in the global settings.

import dspy
import openai
from dspy.retrieve.weaviate_rm import WeaviateRM

# Set OpenAI API key
openai.api_key = "sk-<YOUR-OPENAI-API-KEY>"

# Configure LLM
lm = dspy.OpenAI(model="gpt-3.5-turbo")

# Configure Retriever
rm = WeaviateRM("MyExampleIndex", 
                weaviate_client = client)

# Configure DSPy to use the following language model and retrieval model by default
dspy.settings.configure(lm = lm, 
                        rm = rm)

Step 2: Collect data

Next, we will collect a few training examples (in this case, hand-annotated). In contrast to training a neural network, you will need only a few examples.

# Small training set with question and answer pairs
trainset = [dspy.Example(question="What were the two main things the author worked on before college?", 
                         answer="Writing and programming").with_inputs('question'),
            dspy.Example(question="What kind of writing did the author do before college?", 
                         answer="Short stories").with_inputs('question'),
            dspy.Example(question="What was the first computer language the author learned?", 
                         answer="Fortran").with_inputs('question'),
            dspy.Example(question="What kind of computer did the author's father buy?", 
                         answer="TRS-80").with_inputs('question'),
            dspy.Example(question="What was the author's original plan for college?", 
                         answer="Study philosophy").with_inputs('question'),]

Step 3: Write DSPy program

Now, you are ready to write your first DSPy program. It will be a RAG system. First, you need to define a signature context, question -> answer, as shown in Signatures, called GenerateAnswer:

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

After you have defined the signature, you need to compose a custom RAG class that inherits from dspy.Module. In the __init__(): method, you declare the relevant modules and in the forward() method, you describe the information flow among the modules.

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

Step 4: Compile DSPy program

Finally, you can define the teleprompter and compile the DSPy program. This will update the prompt used in the ChainOfThought module. For this example, we will use a simple BootstrapFewShot teleprompter.

from dspy.teleprompt import BootstrapFewShot

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=dspy.evaluate.answer_exact_match)

# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

Now you can call your RAG pipeline, as shown below:

pred = compiled_rag(question = "What programming language did the author learn in college?")

From here, you can evaluate your results and iterate over the process until you are happy with your pipeline’s performance. For detailed instructions on evaluation, I recommend checking out the Intro Notebook in the DSPy GitHub repository or the Getting Started with RAG in DSPy Notebook by Connor Shorten.

Summary

This article briefly introduced the DSPy framework [1], which the Generative AI community is currently excited about. The DSPy framework introduces a set of concepts to move building LM-based applications away from manual prompt engineering to programming.

In DSPy, traditional prompt engineering concepts are replaced with:

After introducing the DSPy concepts, this article walks you through an example of a naive RAG pipeline using an OpenAI language model and a Weaviate vector database as the retriever model.

Enjoyed This Story?

Subscribe for free to get notified when I publish a new story.

Get an email whenever Leonie Monigatti publishes.

Find me on LinkedIn, Twitter, and Kaggle!

Disclaimer

I am a Developer Advocate at Weaviate at the time of this writing.

References

Literature

[1] Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Vardhamanan, S., … & Potts, C. (2023). Dspy: Compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714.

Additional Learning Resources

Images

If not otherwise stated, all images are created by the author.


Related Articles