
Introduction
A few months ago, I was debating with a friend about the use of Large Language Models (LLMs) in Game Design.
My friend argued that LLMs were too unpredictable to be used consistently in games and should not be used for "live mechanics," while I believed they could propose innovative experiences if controlled through the right framework.
This discussion led me to start a new side project that has become the project I am most proud of.
This series of articles aims to provide a high-level view of all the work I did in the past months and the challenges I still need to face as of today.
Below is a short video showing the game as of February 2024.
In this article, we focus on the core concepts of the game and detail the architecture I imagined to tackle some of the biggest issues I faced, such as information leaks and hallucinations.
Idea and concept
I am not the first one attempting of leveraging AI to design a game. Since GPT has been released, we could see a boom in terms of new experiences around AI.
In terms of gaming, I personally came across AI Dungeon, a text-based game powered by AI, which became my main source of inspiration for my own concept.
I found the idea extremely interesting, even though I think it suffers from one major flaw: the adventure is improvised on the way, which can provide nice and unexpected experiences, but as with everything improvised, the stories generated suffer from a lack of consistency and complexity.
When I read a book or play a video game, I like it when the adventure has a clear beginning and a clear end, with a coherent sequence of events that lead from start to finish. This is something I wanted to incorporate into my view of an AI-based game, and this thought became my main motivation to propose something different.
The vision behind my AI based game
Before starting to code or prototype, I spent some time imagining what my ideal AI-based game would be. Like AI Dungeon, it would be a text-based game where the user can write messages and get responses.
Taking inspiration from RPGs (Role Playing Games), an AI agent would be the Game Master, translating the player’s messages into actions within the game. The Game Master would enforce a story that has been designed ahead of time.
The player could also interact with Non-Playable Characters (NPCs), controlled by other AI agents, each with its own personality and knowledge about the environment. The universe would have rules, be coherent, and the player’s quest would be more or less predetermined. The idea would be to replicate real RPGs while delegating player interactions to AI agents.

Criteria of success
Before starting the proof of concept, I defined a series of success criteria to keep in mind while preparing the v0 of the game. These criteria would help validate if my concept was making sense and if it could be pushed further in the future. These criteria guided me in defining some of the most important features of the game and its user interface.
The main criteria are:
- Narration: Each adventure should have a clear beginning and a clear end.
- Causality: To arrive at the end, the player would perform actions linked by causality (one action cannot be done unless another action has been done before).
- Non-linearity: While some actions are chained by causality events, others are not, allowing the player to perform some actions in different orders while keeping the story coherent.
- Interactions: The player could engage in simple discussions with NPCs or interact with the environment through the Game Master.
- Freedom Feeling: While the adventure has a beginning and an end, the player should feel immersed in the game universe.
Game Theme
Once I had my list of success criteria, I needed to create the story at the center of my proof of concept. The story had to be simple and avoid unnecessary complications so that I could focus on designing features to validate my proof points.
After a few iterations, I decided to go with a simple text-based escape game. The player will be locked in a mysterious room and must find a way out by interacting with different elements of the environment. Among the interactions: finding a key, opening a hatch, observing the surroundings for clues, and interacting with secret elements. A ghost could be summoned to give the player clues in their quest. The game ends when the player successfully finds the exit.

This design has been chosen as it would allow me to test most of the points I wanted to validate, particularly in term of non-linearity.
Inherent issues with LLMs for games
Unlike classical games where everything is deterministic, Llm-based games challenge us with their unpredictability. Unlike a classic video game, you cannot (and you don’t want to) control 100% of the output, as this is what makes the technology interesting.
Nevertheless, to create a nice experience for the end user, we want to control this unpredictability as much as possible to keep it within narrow boundaries (in our case, the story and its chain of events).
Hallucinations
Hallucinations occur when an agent answers inaccurately to the user’s query or starts to invent information that does not exist. This can be problematic for two reasons: first, it can mislead the player. If the player asks, "What is on this table?" and the AI says "a key," the player will try to use the key. But if the key does not exist, this will generate frustration.
These hallucinations can also cause story drifts, particularly if the context is self-modified by the AI agent’s answer. For example, if you pass the conversation history between the player and the AI (which contains hallucinations) to the agent before formulating the next answer, it will use that information and reinforce itself in a "hallucination loop," leading the player in the wrong direction.
Spoilers
A spoiler is when an LLM agent reveals information before it should be revealed to the player. If the information is somehow available to the agent at the wrong time, there is a risk of leakage, and the player can access information they shouldn’t have.
This can happen with a static pre-prompt. If you pass all the story as context for reference to the agent, the information will be available at any time and can be easily leaked. This can also happen with a dynamic pre-prompt using simple Retrieval Augmented Generation (RAG). The player’s message could query a remote vector database, accidentally revealing secrets or information that should be revealed later.
A trade-off between custom adventure and robustness
It took me quite some time to tackle the above issues, and in the end, I decided to sacrifice a bit of "custom experience" (the agent perfectly adapting to the user’s prompt) for more "robustness" (minimizing hallucinations and spoilers with a consistent story that can smoothly go from A to Z).
In a classical "storytelling game," the user moves from scene to scene or dialogue to dialogue by choosing an option from a predefined list, with each option leading to the next deterministic dialogue or action. This setup is robust because each action leads deterministically to the next one, but the custom experience is minimal.
On the other hand, games like AI Dungeon are highly customizable, with no two adventures looking the same. The drawback is that the story cannot be controlled and can quickly become inconsistent or boring.
By combining the two approaches, I proposed something new that would provide enough robustness and customization to be interesting. The idea is to guide the player in their adventure via a causal graph that injects context into the agent depending on the actions already done by the player. The player proposes an interaction, and sometimes, this interaction unlocks new context in the backend of the game and new interactions accessible to the player.
The illustration below shows an example of causal graph in the middle of a game.

In this example, the actions in green ("break the pot", "use the key to open the hatch", "player push the button" and "inspect the painting") have already been done by the player, revealing new context to the player (a secret button behind the painting and behind the hatch).
The orange circles show the "impactful" actions available to a player at this point in time that he did not trigger yet.
Finally, "Player escape" is not yet available as the player did not activate the two buttons, and this action will not be passed in the context of the narrator as long as the remaining actions remains uncompleted.
A placeholder in the causal graph is added for any other action so that the game master indicate to the player his action is not relevant.
In the context of the game, this causal graph is exclusively controlled and modified by a deterministic backend helped by a classifier agent which determine if the message from the player match a "impact full" action.
Causal graph in practice
In this section, we will go more in-depth on how to build a custom causal graph for the game.
Leveraging the "function" feature
The function feature allows an LLM to call a custom function. This was first implemented by OpenAI models in 2023 and is particularly useful for creating interactions with a backend/game engine. It works by passing function schemas to the model, which then responds with the function and parameters it wants to call to get more context. The backend can handle the function call, retrieve new information, and pass it back to another AI agent with fresh context.
I am using this mechanism to build a custom Agent that act as a "impactful" classifier and associate player prompts and available actions from the causal graph.
A simple causal graph architecture
Let’s look at a simple case of how this works in practice.
To make our causal graph concept work, we need a few things:
Game files
- The game state: A file storing the actions already accomplished by the player. This will be used by the backend to build a dynamic context with the next actions available, filtering out what has been done/not accessible yet.
- The causal graph: A file containing a list of nodes encoding the full adventure. Each node represents an impactful action of the story with an action_id, a description, the consequence of the action, and the list of prior action_ids required to activate the node.
Agents
- Impactful Agent: This agent act as a classifier and takes as input a custom pre-prompt containing a prefiltered list of actions available at a point in time, along with the player’s message. It checks if the player’s message matches an important action and leverages the function feature to pass the corresponding action_id to the backend to retrieve extra context.
- Game Master Agent: This agent is visible to the player. It works based on a custom context built by the backend and reflects the consequences of the impactful actions (if any), customizing its answer based on the player’s message. If the action is not impactful, a standard generic pre-prompt explains that the player’s action is meaningless in the current context.
Backend Functions
- filter_game_state: This function takes the current game state as input and filters the causal graph to build a "context prompt" containing all the action_ids and their descriptions available at a point in time to the player. The actions already done and not accessible are filtered out, preventing leaks.
- get_action_consequence: If the Impactful Agent identifies a match between the player’s prompt and an available impactful action, it calls get_action_consequence, querying the causal graph based on the action_id provided and adding more context on the impact of the player’s action.
- update_game_state: To keep track of the current state of the game, this function updates the Game State with the new action_id triggered by the player. This refreshes the set of available actions for the next interaction.
The graph below illustrate the full process to produce an answer based on chaining agents and back-end calls.

This is a very basic architecture, and it could be extended with even more intermediate steps, for example, to add new characters, new environments, etc…
We will review in details and code in Python this architecture in another article.
How It Works with an Example
Let’s look at an example using the causal graph and game state shown above.
An Example of a Node from the Causal Graph
Each node in the causal graph represents an impactful action. It contains all the information required to effectively interface with our custom framework. This includes:
- A unique action_id to identify the node. This action_id can be used to track the player’s progress.
- An action description used by the "classifier agent" to determine if the player’s message matches this action.
- An action consequence, which is passed to the "game master agent" if this particular action is triggered.
- A list of action_ids needed to activate this node, used by the backend to filter the actions available to the player at any given time.
- Optionally, a name to help the game designer identify each node.
Let’s take a look at the node "read the note on the table" from the causal graph presented above. This node would typically look like this:
#Example of causal graph
[ {...},
{
"id":4,
"description":"The player reads the note on the table",
"consequence":"As the player read the note, the air become colde and suddenly
the lights switch off. After a moment everything is back to
normal, but there is now a ghostly figure standing in the
middle of the room"
"needs": [0]
},
{...},
]
Note: Action_id 0 represents the initiation of the game and is activated by default when the game starts.
This separation between "description" and "consequence" is very important. It ensures that no spoilers can leak to the player, as the "consequence" message will never be added to any context prompt unless the action "read the note on the table" is triggered.
Game State
The game state simply contains the list of actions already achieved by the player. In the case of the example above, it would be a list containing the ids of the actions already triggered:
#Game State
[0,1,2,3,5]
Three Examples of Player Messages
Let’s look at the full pipeline when the player attempts an interaction with the game master. At this point in the game, the filter_game_state
function will always return the same piece of pre-prompt looking like this:
- action 4: "The player reads the note on the table"
- action 6: "The player pushes the button found behind the painting"
Example 1: "Jump on the table"
In this case, the message is passed along with the action list from the filter_game_state to the "classifier agent."
#Classifier agent
#Preprompt
You are the classifier agent, your role is to determine if the action from
the user matches on of the action from the list below. If no matching is
found, call
List of actions:
- action 4: "The player reads the note on the table"
- action 6: "The player pushes the button found behind the painting"
If no matches is found in the list above, uses "action X: no match found"
#User message
Jump on the table
In this scenario, the model would return "action X" as no match is found. The backend would then handle this particular case with a generic message passed to the "game master agent":
# Game Master Agent
#Preprompt
The player has made an action without impact in the game. You must explains
to the player that his action has no consequence.
#User message
Jump on the table
Example 2: "I check the note on the table"
In this case, the "classifier agent" will match the player’s message with action 4. Id 4 will be added to the game state, and the "consequence" is passed to the "game master agent." This time, the prompt would look like this:
# Game Master Agent
#Preprompt
The player has made an impactful action in the game. This is the consequence
of its action:
"As the player read the note, the air become colde and suddenly
the lights switch off. After a moment everything is back to
normal, but there is now a ghostly figure standing in the
middle of the room"
#User message
I check the note on the table
This time, the game master will respond properly with the consequence of the player’s action, and the state of the game will be updated to explore the causal graph further.
Example 3: "Is there a ghost in the room?"
This example highlights the importance of the separation between the "description" of the action and its "consequence."
Thanks to this clear separation and the intermediate backend layer, the "game master agent" will never receive the information that there is a ghost in the room before the player actually reads the note.
#Classifier agent
#Preprompt
You are the classifier agent, your role is to determine if the action from
the user matches on of the action from the list below. If no matching is
found, call
List of actions:
- action 4: "The player reads the note on the table"
- action 6: "The player pushes the button found behind the painting"
If no matches is found in the list above, uses "action X: no match found"
#User message
Is there a ghost in the room ?
Would fall again on "action X" leading to a standard answer from the "game master agent"
# Game Master Agent
#Preprompt
The player has made an action without impact in the game. You must explains
to the player that his action has no consequence.
#User message
Is there a ghost in the room ?
Conclusion and Other Considerations
In this article, we discussed the base idea of my LLM-based game, which aims to provide players with greater freedom while still delivering exciting, controlled scenarios. We explored the challenges of working with LLMs, particularly their unpredictable responses, which can divert the player from the main scenario or inadvertently spoil the experience.
We focused on leveraging a deterministic back-end in the form of a causal graph to ensure LLMs provide consistent and directed answers to the player. We detailed a simplified version of this framework, which combines the "function" feature from most LLM providers, two specialized agents, and two files containing the entire causal graph and the list of events already triggered by the player.
This causal graph effectively addresses the issue of spoilers by controlling the information sent to the agent before it generates a response for the player. Regarding hallucinations, this framework significantly helps in guiding the story’s progression (from action ID to action ID) but at this point it can still happen from time to time that the model provides erroneous answers, which can mislead the player.
In the next article of the series, we will look closely into pre-prompt design and further strategies for controlling hallucinations. We will also discuss the trade-off between cost and accuracy, exploring how we can achieve better results with smaller (and more cost-effective) models. Additionally, we will review the strategies in place to properly evaluate the full pipeline’s performance within the context of the game.
Thanks for reading!