The world’s leading publication for data science, AI, and ML professionals.

Turn GPT-4 into a Poker Coach

Unleashing Creativity Beyond Chatbot Boundaries

Photo by Michał Parzuchowski on Unsplash
Photo by Michał Parzuchowski on Unsplash

In this article, we will not talk about how LLM models can pass a law exam or replace a developer.

We will not look at hints on optimizing prompts for making GPT do motivation letters or marketing content.

Like many people, I think that the emergence of the LLM like GPT4 is a little revolution from which a lot of new applications will emerge. I also think that we should not reduce their use to simple "chatbot assistants" and that with the appropriate backend and UX, those models can be leveraged to incredible next-level applications.

This is why, in this article, we are going to think a bit out of the box and create a real application around the GPT API that could not be accessed simply via the chatbot interface and how a proper app design could serve a better user experience.

Setting up some context

Leveraging GPT4 in businesses

I played a lot with GPT4 since its release and I think there is globally two main families of use cases for using the model to generate a business.

The first way is to use GPT4 to generate static content. Say you want to write a cooking book with a particular theme (for example Italian food). You can make detailed prompts, generate a few recipes from GPT, try them yourself, and integrate the one you like in your book. In that case "prompting" will have a fixed cost and once the recipes are generated you don’t need GPT anymore. This type of use case can find a lot of variation (Marketing content, website content, or even generating some datasets for other uses), but is not as interesting if we want to focus on AI-oriented apps.

The logic of generating the content is outside the application, Author Illustration
The logic of generating the content is outside the application, Author Illustration

The second use case is live prompting through an interface of your design. Going back to the cooking field: we could imagine a well-suited interface in which a user can pick up a few ingredients, a specialty, and ask the application to generate directly the recipe. Unlike in the first case, the content generated can be potentially infinite and suit better the needs of your users.

In this scenario, the user interacts directly with the LLM via a well-designed UX which will generate prompts and content, Author Illustration
In this scenario, the user interacts directly with the LLM via a well-designed UX which will generate prompts and content, Author Illustration

The drawback of this is that the number of calls to the LLM will be potentially infinite and grow with the number of users, unlike before where the amount of calls to the LLM was finite and controlled. This implies that you will have to design properly your business model and take a lot of care into including the cost of prompts in your business model.

As of when I am writing these lines, GPT4 "prompt" costs 0.03$/1000 tokens (with both request and answer tokens counted in the pricing). It does not seem like a lot, but could quickly escalate if you don’t pay attention to it. To work around this, you could for example propose to your user a subscription depending on the amount of prompts or limited the amount of prompts per user (via a login system etc…). We will talk a bit more in detail about pricing later in this article.

Why a use-case around Poker?

I thought for some time of the perfect use case to try around LLMs.

First, Poker analysis is theoretically a field in which LLM should perform well. In fact, every poker hand played can be translated into a standardized simple text describing the evolution of the hand. For example, the hand below describes a sequence in which "player1" win the pot after making a raise on the bet of "player2" after the "flop" action.

Seat 2: player1(€5.17 in chips) 
Seat 3: player3(€5 in chips) 
Seat 4: player2(€5 in chips) 
player1: posts small blind €0.02
player2: posts big blind €0.05
*** HOLE CARDS ***
Dealt to player2[4s 4c]
player2: raises €0.10 to €0.15
player1: calls €0.13
player3: folds 
*** FLOP *** [Th 7h Td]
player1: checks 
player2: bets €0.20
player1: raises €0.30 to €0.50
player2: folds 
Uncalled bet (€0.30) returned to player1
player1collected €0.71 from pot

This standardization is important because it will make the development more simple. We will be able to simulate hands, translate them into this kind of prompt message, and "force" the answer of the LLM to continue the sequence.

A lot of theoretical content is available in books, online, etc… Making it likely that GPT has "learned" things around the game and good moves.

Also, a lot of added value will come from the app engine and the UX, and not only from the LLM itself (for example we will have to design our own poker engine to simulate a game), which will make the application harder to replicate, or to simply "reproduce" via GPTChat.

Finally, the use case might adapt well to the second case scenario described above, where the LLM and a good UX can bring a completely new experience to users. We could imagine our application playing hands again a real user, analyzing hands and also giving rates and areas of improvement. The price per request should not be a problem as poker learners are used to paying for this kind of service, so a "pay as you use" might be possible in this particular use case (unlike the recipe concept app mentioned earlier for example)

About GPT4 API

I decided to build this article around GPT4 API for its accuracy in comparison to GPT3.5. OpenAI provides a simple Python wrapper that can be used to send your inputs and receive your outputs from the model. For example:

import openai
openai.api_key = os.environ['OPENAI_KEY']

completion = openai.ChatCompletion.create(
  model="gpt-4",
  messages=[{"role": "system", "content": preprompt_message}, 
            {"role": "user", "content": user_message}]
)

completion.choices[0].message["content"]

The "pre-prompt" used with the role "system" will help the model to act the way you want him to act (you can use it typically to enforce a response format), the role "user" is used to add the message from the user. In our case, those messages will be pre-designed by our engine, for example, passing a particular poker hand to complete.

Note that all the tokens from "system", "user" and from the answer are counted in the price scheme, so it is really important to optimize those queries as much as you can.


Initial Exploration

The first thing to assess is the general quality of GPT for the different tasks related to poker. I want to evaluate quickly its ability to realize different tasks that would be part of the core of the application.

Continuing a hand with a realistic move and explanations

One of the core ideas of the application will be to make one or several instances of GPT play against a user. We want to assess how well GPT plays. In order to do so, I provided the model samples of hands I played for which I wanted an analysis regarding the next move and the explanation.

An example of a test:

preprompt = """You are my poker coach, I want you to play the next move and 
justify simply your action based on your expertise"""

message = """Stacks:
UTG ($54.93) 110bb
UTG+1 ($55.87) 112bb
CO ($61.62) 123bb
BTN ($60.96) 122bb
SB ($25.54) 51bb
BB Hero ($61.50) 123bb

Pre-Flop: (0.75, 6 players) Hero is BB :jc: :jd:
2 folds, CO raises to $1.50, BT raises to $6, 1 fold, 
Hero raises to $14, CO folds, BT calls $8
Flop: :3s: :2h: :tc: ($29.75, 2 players)
Hero bets $7.50, BT calls $7.50

Turn: :5h: ($44.75, 2)
Hero (?)
"""

completion = openai.ChatCompletion.create(
  model="gpt-4",
  messages=[{"role": "system", "content": preprompt}, 
            {"role": "user", "content": message}]
)

And the answer:

Hero should bet around $22.

Justification:
Our pocket Jacks have improved to top set on the flop, which is a great 
advantage for us. With the current board texture, we need to continue 
building the pot to extract value from hands like smaller pairs, 
flush draws, or straight draws. The turn 5h doesn't 
change much about the board, so we should capitalize on our strong hand 
by betting around $22 (around half pot). 
This will put pressure on our opponent and potentially make them commit 
with worse hands if they decide to call or raise.

I performed several tests like this, validating the fact that the model would be able to play decently.

Interestingly, by modifying the pre-prompt, I could control the behavior of the model making him play more or less aggressively.

You are a very tight player who tend to under-evaluate 
the strengh of your hand

would lead, for the same action, to a different move from the AI:

Action: Hero checks.

Justification: Based on my tight playing style and under-evaluation of my hand 
strength tendency, I will check in this situation. 
By checking, I can control the pot size and potentially induce a bet from BT 
if they have a weaker hand. At the same time, 
I can also minimize my losses if BT has a stronger hand. 
Considering the board is relatively low and it doesn't have a lot of 
threatening draws, the check allows me to proceed cautiously but still 
have a chance to win the hand later.

This means we could potentially code different "AI" playing with different styles to simulate a real poker field with players with different levels and styles.

Rating the actions of a player from a full-hand history

A feature we might want to integrate into our application is the possibility to rate the user. Grades are always a good indicator of progress or a good way to target particular weaknesses when learning something.

In order to be processed correctly by our application, we would have to "enforce" the answer of the bot so that it can be simply parsed and used in the application (for example averaging over sessions or over thematics).

For example, the following combination of prompts:

Preprompt:
You are a poker coach. I will provide you a hand. 
I want you to identify for a given hand key thematics
and rate the relevant thematics /20. 
The thematics you can use are the following:
"preflop","flop","turn","river","aggressivity","sizing", "overall"
You can use one or several of these thematic during your 
evaluation depending on what is relevant. 
I expect your output to have the following format:
"theme1: X - theme2: Y - ..."

Message:
Stacks:
UTG ($54.93) 110bb
UTG+1 ($55.87) 112bb
CO ($61.62) 123bb
BTN ($60.96) 122bb
SB ($25.54) 51bb
BB Hero ($61.50) 123bb

Pre-Flop: (0.75, 6 players) Hero is BB :jc: :jd:
2 folds, CO raises to $1.50, BT raises to $6, 1 fold, Hero raises to $14, CO folds, BT calls $8
Flop: :3s: :2h: :tc: ($29.75, 2 players)

Hero bets $7.50, BT calls $7.50

Turn: :5h: ($44.75, 2)
Hero bets $9, BT calls $9 River: :9c: ($62.75, 2)

Hero checks, BT goes all-in $30.46, Hero calls $30.46

Would generate the following output:

'preflop: 18 - flop: 16 - turn: 14 - river: 13 - aggressivity: 15 - sizing: 12 - overall: 15'

This is nice for two reasons:

  • We can control the number of tokens displayed by the model
  • We can easily parse the answer in a dictionary to be used by our app

Transforming the idea into a concrete application

After validating the two points above, it is time to code something bigger to demonstrate the whole concept and see how the LLM can be integrated into a larger application that diverges from a simple chatbot.

In the context of this article, we will go with something fairly simple but it should give you a hint about all the capabilities behind LLMs when used out of the box.

The goal of our demonstrator will be to make GPT play a hand against us, and then, once the hand has been completely played, ask the model to give us some hints on how to improve based on that hand only.

For this purpose, I quickly coded a simple poker engine that will help simulate a hand and play against an AI opponent. I will not go too much into the detail of the engine here (which is not the point of the article), but simply give you a brief overview of its design.

Schematic for the Poker Coach proof of concept
Schematic for the Poker Coach proof of concept

The role of the poker engine is to keep track of the game’s general metadata (number of players, stacks, cards distributed, player turn…). The next action (both from a player or an AI) is added to the engine as a standard text message such as "call", "fold", or "raise 60", etc… which is parsed and transcribed as new input for the engine to run the next step.

The poker engine also transcribes the sequence of action into a text file which is used to feed the prompt used by the AI to take a decision on the next move.

Two pre-prompts will be used: one for deciding on the next action to take, and one, used at the end of the hand, to rate the actions of the human player.

Playing the hand

In order for our application to understand the action from GPT, we need to make sure that the message of GPT is standardized. This can be enforced with the pre-prompt. In my case I expect GPT to answer me with a maximum of two words "ACTION AMOUNT", and I am enforcing this behavior by providing an example of the model to which it would adapt with this simple playing pre-prompt:


I want you to help me improve in poker by playing games.
Use only keywords 'fold', 'call', 'bet', 'check', or 'raise' 
with chip amounts.

Example:
'**hand details**
GPT (?)'

Answer: 'raise 40'

I will combine this pre-prompt with the actual hand history generated by the poker engine:

GPT_0 has [Ts, 9s]
SB: hero (25.0 BB)
BB: GPT_0 (25.0 BB)
hero is BTN
hero pay SB (10 chips)
GPT_0 pay BB (20 chips)
hero: raise 50
GPT_0 (?)

This leads to a suitable answer that can be parsed and integrated into the poker engine to update the game and continue with the next action:

call 50

The application in action

Put in snapshots, it might not look impressive, so I recorded a little video about the AI in action in a notebook. In the context of the video, I also asked the model to justify its move before taking action. The cards of all the players are visible in the widget for debugging purposes, but in real conditions, they would be hidden from the player. When the AI takes the action, only its own cards are passed to the prompt.

My message parser simply takes the action after the "//" separator and I used the input() function to register the user input.

GPT playing a hand of poker against me, Author illustration
GPT playing a hand of poker against me, Author illustration

Hand review

Once the head is completed, the engine has generated the whole hand that can be passed to our "evaluation module" where the LLM will rate the hand.

hero has [5c Qs]
SB: hero (500 chips)
BB: GPT_0 (500 chips)
hero is BTN
hero pay SB (10 chips)
GPT_0 pay BB (20 chips)
hero: raise 50
GPT_0: call 50
FLOP
POT:100.0
BOARD: [7c 6d Kc]
hero: bet 100
GPT_0: call 100
TURN
POT:200.0
BOARD: [7c 6d Kc 3d]
hero: bet 200
GPT_0: fold

Using the prompt designed earlier, this is the kind of answer, easy to parse, that we get:

preflop: 14 - flop: 12 - turn: 15 - aggressivity: 16 - sizing: 13 - overall: 14

This output could be easily parsed and data stored in a DB from which we could extract all sorts of analysis for the player like identifying weaknesses depending on hand, position, configuration, etc… based on average grades provided by the model.


Conclusion

At this point, we proved with an example that new concepts centered on LLMs could emerge when coupled with well-suited app designs which diverge from the simple chatbot assistant.

Of course, the example above would be only the first step out of many before having a fully ready application, but it should be sufficient to bring food for thoughts and creativity.

Extrapolating a bit about how would look the next steps toward the development of a real application for a poker coach, we would have to take several actions:

  • Prompt token optimization: a large part of the costs of our application would come from prompt pricing. Optimizing the prompts in terms of quantity is essential to be able to reduce costs and be competitive.
  • Prompt content optimization: the quality of the output (rating, next action) can vary a lot depending on the context you provide to the model. For example, asking the model to first make an analysis before taking an action can improve a lot the coherence of the action taken. Many tests and iterations, in partnership with real players, should be needed to make sure the quality of the output is sufficient for a production-level application.
  • Error handling: even if the outputs provided by the LLM fit your template most of the time, it is also important to handle the cases in which the model will provide an answer which does not fit your parser. Unlike all your functions, this part of the application can stay unpredictable, and it would be important to add an extra layer of control to make sure no bugs happen due to bad formats or impossible answers.
  • User interface: while playing in a notebook with text inputs is enough for exploration like in this article, a core element would be a clean UX to sublime the user experience and make the interaction with the model and the engine smooth.

While prices for this family of applications might be still high, I am sure that mass adoption and future improvements will tend to reduce their cost. We are only at the beginning of the development of those new technologies and it’s up to us to be creative and embrace the potential of LLMs, transforming them into powerful tools for innovative and user-centric experiences beyond traditional chatbot limitations.


Related Articles