
Given the non-deterministic nature of LLMs, it’s easy to end up with outputs that don’t fully comply with what our application is intended for. A well-known example is Tay, the Microsoft chatbot that famously started posting offensive tweets.
Whenever I’m working on an LLM application and want to decide if I need to implement additional safety strategies, I like to focus on the following points:
- Content Safety: Mitigate risks of generating harmful, biased, or inappropriate content.
- User Trust: Establish confidence through transparent and responsible functionality.
- Regulatory Compliance: Align with legal frameworks and data protection standards.
- Interaction Quality: Optimize user experience by ensuring clarity, relevance, and accuracy.
- Brand Protection: Safeguard the organization’s reputation by minimizing risks.
- Misuse Prevention: Anticipate and block potential malicious or unintended use cases.
If you’re planning to work with LLM Agents soon, this article is for you.
What are guardrails?
In this context, implementing guardrails for an Agent means ensuring that the first output from the Agent is not the final answer.
Technically speaking, you want to evaluate the Agent’s output based on specific constraints and, if necessary, force the Agent to regenerate its answer until it meets your requirements.
For example, imagine an application that summarizes all the emails you’ve received over the past month. You’ve specified that personal information, such as the sender’s name, must be anonymized. However, LLMs can sometimes "forget" this condition due to their unpredictable nature. In such cases, you can rely on an additional step to verify whether your strict condition has been met, since it’s a critical requirement.
Intro to CrewAI
CrewAI is my go-to framework whenever I work with Agents. It’s simple, backed by a strong community, open source, and completely focused on Agents. It also comes with plenty of extra functionalities, making it a very appealing choice.
Before diving into how to implement guardrails with CrewAI, let me briefly introduce the main components we’ll be working with.
Agents, Tasks, and Crews
In CrewAI, there’s a clear separation between the Tasks and the Agent itself. This separation allows you to decouple what would otherwise be part of a single, large prompt. By keeping these two concepts distinct, you can clearly define the role of the Agent and what the Agent is supposed to do.
# Fitness Tracker Agent
fitness_tracker_agent = Agent(
llm=llm,
role="Fitness Tracker",
backstory="An AI that sets goals, tracks progress, and recommends workouts.",
goal="Set goals, track progress, and recommend workouts.",
verbose=True,
)
# Fitness Tracker Task
set_and_track_goals = Task(
description=f"Set the fitness goal ({fitness_goal}), track progress, and use weight history: {historical_weight_data}.",
expected_output="A muscle gain plan with tracking.",
agent=fitness_tracker_agent,
)
In this example, we have a fitness Agent responsible for creating a muscle gain plan based on the fitness goal and historical weight data. As you can see, the fitness_tracker_agent
doesn’t inherently know what to do until we create a Task and specify which agent will handle it.
Finally, we bring all the Tasks and Agents together into a Crew, which forms the heart of our application.
# Crew setup
crew = Crew(
agents=[fitness_tracker_agent, recommendation_agent],
tasks=[set_and_track_goals, fetch_fitness_recommendations, provide_workout_plan],
planning=True,
)
The final step is to execute our Crew using the input parameters that will be passed to the Tasks – in this case, the fitness goal and the historical weight data.
Flows
What we’ve just covered is enough to build a simple Crew, but it’s not sufficient to create a fully functional agentic application with guardrails. The next concept we need to master is CrewAI Flows.
The idea behind Flows is to easily create dynamic AI workflows with chained Tasks, seamless state management, event-driven responsiveness, and flexible control flow for conditions, loops, and branching.
The following decorators help manipulate the flow of execution:
@start()
: Marks the entry point of the Flow and initiates Tasks when the Flow begins.@listen()
: Executes a method when a specific Task or event is completed.@router()
: Directs the Flow to different paths based on conditions or outcomes.
from crewai.flow.flow import Flow, listen, start
from pydantic import BaseModel
class ExampleState(BaseModel):
counter: int = 0
message: str = ""
class StateExampleFlow(Flow[ExampleState]):
@start()
def first_method(self):
self.state.message = "Hello from first_method"
self.state.counter += 1
@listen(first_method)
def second_method(self):
self.state.message += " - updated by second_method"
self.state.counter += 1
return self.state.message
flow = StateExampleFlow()
final_output = flow.kickoff()
print(f"Final Output: {final_output}")
print("Final State:")
print(flow.state)
Additionally, Flows allow access to a shared object that stores and manages data (ExampleState
), enabling seamless communication between Tasks and preserving context throughout the workflow.
Guardrails with CrewAI Flow
With the two simple concepts we’ve just covered, we’re ready to enhance our AI Agent. In this example, I’ll demonstrate how to create a multi-agent AI application capable of generating text and verifying whether the text contains violent content before providing the output.
The check for violence in the text occurs within the application itself. Thanks to Flow’s state management, we can control how often the text is regenerated, preventing infinite loops. This approach introduces a level of determinism to something that is inherently non-deterministic.
Imports
The primary library used is CrewAI, but we’ll also import Pydantic to create a BaseModel
class, as CrewAI’s documentation requires.
For this example, we’ll assume two Agents and their corresponding Tasks have already been created as described earlier: one for generating text and another for checking it for violent content.
from typing import List
from pydantic import BaseModel, Field
from crewai import Flow, start, listen, router
In addition to the Flow
class, we’ll need to use the three decorators to build our application.
State
The state is what we use to persist data throughout the Flow execution. The first piece of information we need to persist is the generated text. This is where we store the text across all iterations, updating it as necessary.
class ViolenceCheckState(BaseModel):
generated_text: str = ""
contains_violence: bool = False
generation_attempts_left: int = 2
To control the Flow, we’ll also use a flag to indicate whether the text contains violence, as well as a counter to track how many generation attempts remain. These attributes will be accessed and updated during the Flow’s execution.
Flow Class
The class we’re implementing consists of methods designed to control the flow of the application. Let’s build it step by step, starting with the __init__
function.
class ViolenceCheckFlow(Flow[ViolenceCheckState]):
topic: str = Field(description="Topic for text generation")
def __init__(self, topic: str):
super().__init__()
self.topic = topic
...
So far, it’s business as usual. We’re simply initializing the sole attribute of our class. What’s new here is the ViolenceCheckState
Pydantic model, which we pass to the superclass Flow
. This model will represent our state.
...
@start()
def generate_text(self):
print(f"Generating text based on input topic: {self.topic}")
task = create_text_generation_task(self.topic) # Pass the input topic to the task
crew = Crew(agents=[text_generator_agent], tasks=[task])
result = crew.kickoff()
self.state.generated_text = result.raw
print("Text generated!")
@listen(generate_text)
def validate_text_for_violence(self):
print("Validating text for violence...")
task = create_violence_check_task(self.state.generated_text)
crew = Crew(agents=[violence_checker_agent], tasks=[task])
result = crew.kickoff()
self.state.contains_violence = "Violence" in result.raw
print("Validation complete:", "Violence detected" if self.state.contains_violence else "No violence detected")
...
The first method must use the @start
decorator to define the starting point of our Flow. It’s followed by a second method, which listens to the first one and is executed after it.
These two methods perform the following Tasks:
generate_text
: This method creates a Crew that generates text based on a given topic. The generated result is saved into the state viaself.state.generated_text
.validate_text_for_violence
: This method uses a different Crew to check if the text contains violent content. If violence is detected, we setself.state.contains_violence
toTrue
.
Adding Complexity with Routing
Here’s where it gets interesting. Now we use the @router
decorator to direct the Flow based on whether the application needs to regenerate the text. This decision is based on the value of self.state.contains_violence
.
...
@router(validate_text_for_violence)
def route_text_validation(self):
if not self.state.contains_violence:
return "safe"
elif self.state.generation_attempts_left == 0:
return "not_feasible"
else:
return "regenerate"
...
The route_text_validation
method is executed immediately after validate_text_for_violence
, as specified by the decorator. It checks the contains_violence
attribute in the state and:
- Returns
"safe"
if no violence is detected. - Triggers a regeneration of the text if violence is detected and attempts remain.
Handling Signals
The last part of our class includes three methods that listen for specific signals sent from route_text_validation
: "safe"
, "regenerate"
, and "not_feasible"
.
...
@listen("safe")
def save_safe_text(self):
with open("safe_text.txt", "w") as file:
file.write(self.state.generated_text)
print("Safe text saved to file")
@listen("regenerate")
def regenerate_text(self):
self.state.generation_attempts_left -= 1
self.generate_text()
@listen("not_feasible")
def notify_user(self):
print("Generated text contains violence and further attempts are not feasible.")
...
Here’s how each scenario is handled:
"safe"
: If no violence is detected, the generated text is saved to a file."regenerate"
: If text regeneration is triggered, thegeneration_attempts_left
counter is decremented, and thegenerate_text
method is called again, restarting the process."not_feasible"
: This occurs when the maximum number of regeneration attempts (in this case, 2) is reached, and the application cannot produce a text free of violence.
Conclusions
This approach has been a lifesaver for me. The implementation focuses on organizing the loop and handling iterations rather than reinventing the logic behind the application’s flow. In this example, I focused on checking whether the text contained violence and determining the appropriate actions if it did. I didn’t have to waste time managing the flow logic itself, thanks to this clean, built-in feature provided by the framework.
More Use Cases
Among the many use cases for this approach, I’d like to highlight query injection. This is more common than you might think, especially with the growing prominence of roles like "prompt engineering." With CrewAI, you can create a Flow that evaluates initial queries to determine if an attacker is attempting to exploit a chatbot, for example.
Challenges
This is the framework way: it gives you a lot, but when something goes wrong, it can take everything back with interest. I’ve started simple applications only to find myself opening pull requests on GitHub to fix bugs after hours of work.
That said, it’s not as bad as it sounds. CrewAI is still my first choice for multi-agent applications. However, as with any new library, relying on it means accepting the risks and being prepared to address them.
If you want to know more about this library, check out my last talk:
