
Artificial Intelligence (AI) has made inspiring progress in games thanks to the advances of reinforcement learning. To name a few, AlphaGo [1] beat human professionals in the game of Go. AlphaZero [2] taught itself from scratch in the games of chess, shogi, and Go, and became a master in the arts. In more recent years, the efforts of researchers have been brought into fruition in poker games, such as Libratus [3] and DeepStack [4], achieving expert-level performance in Texas Hold’em. Poker is one of the most challenging games in AI. As a player, we need to consider not only our own hand but also other players’ hands which are hidden from our sight. This leads to an explosion of the possibilities.
In this article, I would like to introduce an open-source project for reinforcement learning in card games, recently developed by DATA Lab at Texas A&M University. The article will first outline the project, then provide a running example of how to train an agent from scratch to play Leduc Hold’em poker, a simplified version of Texas Hold’em. The goal of the project is to make artificial intelligence in poker game accessible to everyone.
Overview
![[5] An overview of RLCard. Each game is wrapped by an Env (Environment) class with easy-to-use interfaces.](https://towardsdatascience.com/wp-content/uploads/2019/11/1-2I91Sb789qbhA7GOpqDKA.png)
RLCard provides various card environments, including Blackjack, Leduc Hold’em, Texas Hold’em, UNO, Dou Dizhu (Chinese poker game) and Mahjong, and several standard reinforcement learning algorithms, such as Deep Q-Learning [6], Neural Fictitious Self-Play (NFSP) [7] and Counterfactual Regret Minimization [8]. It supports easy installation and rich examples with documentations. It also supports parallel training with multiple processes. The following design principles are adopted:
Reproducible: Results from the environments can be reproduced and compared. The same result should be obtained with the same random seed in different runs. Accessible: Experiences are collected and well organized after each game with straightforward interfaces. State representation, action encoding, reward design, or even the game rules, can all be conveniently configured. Scalable: New card environments can be conveniently added into the toolkit with the above design principles. The dependency in the toolkit is minimized so that the codes can be easily maintained.
Leduc Hold’em Poker
Leduc Hold’em is a simplified version of Texas Hold’em. The game is played with 6 cards (Jack, Queen and King of Spades, and Jack, Queen and King of Hearts). Each player will have one hand card, and there is one community card. Similar to Texas Hold’em, high-rank cards trump low-rank cards, e.g., Queen of Spade is larger than Jack of Spade. A pair trumps a single card, e.g., a pair of Jack is larger than a Queen and a King. The goal of the game is to win as many chips as you can from the other players. More details of Leduc Hold’em can be found in Bayes’ Bluff: Opponent Modelling in Poker [9].
The first example: NFSP on Leduc Hold’em
Now, let’s train a NFSP agent on Leduc Hold’em with RLCard! The full example code is shown as below:
In the example, there are 3 steps to build an AI for Leduc Hold’em.
Step 1: Make the environment. Firstly, tell "rlcard" that we need a Leduc Hold’em environment.
env = rlcard.make('leduc-holdem')
Step 2: Initialize the NFSP agents. Second, we create two built-in NFSP agents and tell the agents some basic information, for example, the number of actions, the state shape, the neural network structure, etc. Note that NFSP has some other hyperparameters, such as the memory size. Here we use the default.
with tf.Session() as sess:
agents = []
for i in range(env.player_num):
agent = NFSPAgent(sess,
scope='nfsp' + str(i),
action_num=env.action_num,
state_shape=env.state_shape,
hidden_layers_sizes=[128,128],
q_mlp_layers=[128,128])
agents.append(agent)
# Setup agents
env.set_agents(agents)
Step 3: Generate game data and train the agents. Third, the game data can be generated with the "run" function. Then, we feed these transitions to the NFSP and train the agents.
episode_num = 10000000
for episode in range(episode_num):
# Generate game data
trajectories, _ = env.run(is_training=True)
# Train the agents
for i in range(env.player_num):
for ts in trajectories[i]:
agents[i].feed(ts)
rl_loss = agents[i].train_rl()
sl_loss = agents[i].train_sl()
The NFSP agents will then learn to play Leduc Hold’em through self-play. The performance can be measured by the tournament of the NFSP agents and random agents. You can also find the code and the learning curves here. The example learning curve is shown as below:

The NFSP agent gradually improves itself in terms of the performance against random agents. If you would like to explore more examples, check out the repository. Have fun!
Play with Pre-trained Model
RLCard also provides a pre-trained Leduc Hold’em model with NFSP. We can play against the pre-trained agents by running this script.
>> Leduc Hold'em pre-trained model
>> Start a new game!
>> Agent 1 chooses raise
=============== Community Card ===============
┌─────────┐
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
└─────────┘
=============== Your Hand ===============
┌─────────┐
│J │
│ │
│ │
│ ♥ │
│ │
│ │
│ J│
└─────────┘
=============== Chips ===============
Yours: +
Agent 1: +++
=========== Actions You Can Choose ===========
0: call, 1: raise, 2: fold
>> You choose action (integer):
Summary
To learn more about this project, check it out here. The team is actively developing more features for the project, including visualization tools and a leaderboard for tournaments. The ultimate goal of this project is to enable everyone in the community to have access to training, comparing and sharing their AI in card games. I hope you enjoy the reading. In my next post, I will introduce the mechanisms of the Deep-Q Learning on BlackJack and we will take a look of how the algorithm is implemented and its application on card games.
References: [1] Silver et al. Mastering the game of Go with deep neural networks and tree search (2016). [2] Silver et al. Mastering the game of Go without human knowledge (2017). [3] Brown and Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals (2018). [4]Moravčík et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker (2017). [5] Zha et al. RLCard: A Toolkit for Reinforcement Learning in Card Games (2019). [6] Minh et al. Human-level control through deep reinforcement learning (2015). [7] Heinrich and Silver. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games (2016). [8] Zinkevich et al. Regret Minimization in Games with Incomplete Information (2008). [9] Southey et al. Bayes’ Bluff: Opponent Modelling in Poker (2012).