The world’s leading publication for data science, AI, and ML professionals.

Teaching A Neural Net To Play Blackjack

We Train a Neural Net to See Whether Applying Deep Learning Can Improve Our Blackjack Strategy

Photo by Drew Rae from Pexels
Photo by Drew Rae from Pexels

Last time we developed code to simulate blackjack. And through these simulations, we discovered the key drivers of the casino’s advantage. Here is a quick recap of our previous findings:

  • Casinos gain an edge on blackjack players by forcing the players to act before the dealer (and act on incomplete information). This exposes them first to the risk of busting (so they might all have busted before the dealer even gets the chance to act).
  • Players are especially in danger when their hands total between 12 and 16 (they are at risk of busting with their next card) and the dealer is showing a high card. In these cases, the assumption is that the dealer will end up with a high hand total, so the players must hit or perish. We can see this visually in the way that the player’s probability of winning or tying troughs between 12 and 16 (The Valley of Despair).
Probability of Win or Tie vs. Player's Hand Value (21 not shown because the probability is 100%)
Probability of Win or Tie vs. Player’s Hand Value (21 not shown because the probability is 100%)
  • Finally, we observed that a naive strategy of only hitting when there is zero chance of busting greatly improves our odds of beating the casino as it shifts the risk of busting entirely to the casino

If you are unfamiliar with the game of blackjack, my previous post also describes how the game is played and the rules.


But Can Deep Learning Do Better?

The objective of today’s post is whether we can use deep learning to arrive at a better strategy than the naive one. We will:

  1. Generate data using our blackjack simulator that we coded last time (with a few modifications to make it more suitable for training algorithms).
  2. Code up and train the neural net to play blackjack (hopefully optimally).

If you are unfamiliar with neural nets, I wrote extensively about them in this post (it’s the post I worked hardest on so please check it out).

A Visual Depiction of a Simple Neural Net (From Understanding Neural Networks)
A Visual Depiction of a Simple Neural Net (From Understanding Neural Networks)

Before we jump into the training process, let’s step back and quickly discuss what some of the pros and cons of using a neural net in this case are. Neural nets are highly flexible algorithms – like soft clay, a neural net adjusts itself to fit the contours of the data even with little to no transformation. Data that would trouble something more rigid like linear regression is easily handled by a neural net. Additionally, the layers and neurons within the network will learn any deeply embedded, non-linear relationships that may exist in the data.

However, this versatility comes at a cost – the neural net is a black box model. Unlike regression where we can learn how the model makes decisions by looking at the regression coefficients, there is no such transparency with a neural net. Also, neural nets run the risk of fitting our data too well and then not generalizing well on out of sample data. In my opinion, these disadvantages are worth keeping in mind and designing safeguards for, but they are not reasons to shy away from using neural nets.


Generating Our Training Data

Before we can train our neural net, we first need to figure out how to structure our training data so that the model we build with it will be useful.

What do we want to predict? In my view, there are two candidates for our target variable:

  1. Probability of losing the game. Given the situation, we might want the model to tell us what the probability of a loss is. Then again, this would only be useful if we could scale up or down our bet, which we cannot in blackjack.
  2. Rather, we want our neural net to identify the correct action, hit or stay. So our target variable should be "whether the correct move was to hit or stay".

It actually took me a while to figure out the best way to set this up. But here is what I came up with.

We need a way for the neural net to know whether a given move was correct or not. It doesn’t need to be foolproof, it just needs to be generally correct. So my method of deciding whether a given move is the correct one is to simulate a game of blackjack: deal the cards to both player and dealer, check if anyone has a blackjack, make only one move (either hit or stay), simulate the game to its end and record the result. Since the simulated player only makes a single decision, we can assess the quality of that decision by whether he wins or loses the game:

  • If player hits and wins, then hit (Y=1) was the correct decision.
  • If player hits and loses, then stay (Y=0) was the correct decision.
  • If player stays and wins, then stay (Y=0) was the correct decision.
  • If player stays and loses, then hit (Y=1) was the correct decision.

This allows us to train our model so that its output is a prediction of whether to hit or stay. The code is similar to last time’s so I won’t give a detailed overview here (you can also find it on my GitHub here). But the primary features are:

  1. Dealer’s face up card (the other is hidden from view).
  2. Player’s total hand value.
  3. Whether the player has an ace or not.
  4. The action of the player (hit or stay).

And the target variable is the correct decision as defined by the logic above.


Training the Neural Net

We will be using the Keras library for our neural net. Let’s first get our imports out of the way:

from keras.models import Sequential
from keras.layers import Dense, LSTM, Flatten, Dropout

Now let’s set up our input variables for training the neural network. The variable feature_list is a list with the column names of the features (X variables) that I listed above. The dataframe model_df is where I store all the data from the blackjack simulations that I ran.

# Set up variables for neural net
feature_list = [i for i in model_df.columns if i not in
                ['dealer_card','Y','lose','correct_action']
               ]
train_X = np.array(model_df[feature_list])
train_Y = np.array(model_df['correct_action']).reshape(-1,1)

The lines of code to actually instantiate and train our neural net are pretty simple. The first line (line 1) creates a sequential type neural net, which is a linear sequence of neural net layers. The lines after line 1 add layers to our model one by one (dense is the simplest layer type and is just a bunch of neurons) – the numbers like 16, 128, etc. specify the number of neurons in each layer.

Finally for the last layer, we need to choose an activation function. This converts the raw output of the neural network into something interpretable by us. Pay attention to two things about the final layer. First, it includes only one neuron because we are predicting between two possible outcomes (two class problem). And second, we use a sigmoid activation because we want our neural net to act like logistic regression and predict whether the correct move is to hit (Y=1) or to stay (Y=0) – in other words, we want to know the probability that hitting is the correct move.

The last two lines tell our neural net model what loss function to use (binary cross-entropy is a loss function used by classification models that output probabilities) and fits the model to our data. I didn’t spend too much time tweaking the numbers of layers or neurons, but if someone were to play around with my code, I would suggest those as potential avenues for improvement.

# Set up a neural net with 5 layers
model = Sequential()                         # line 1
model.add(Dense(16))
model.add(Dense(128))
model.add(Dense(32))
model.add(Dense(8)) 
model.add(Dense(1, activation='sigmoid'))    # final layer
model.compile(loss='binary_crossentropy', optimizer='sgd')
model.fit(train_X, train_Y, epochs=20, batch_size=256, verbose=1)

Checking Out the Performance of Our Model

A quick way to eye-ball whether our model adds any value is to use a ROC Curve (check out the linked blog by yours truly if you would like a deep dive on ROC Curves). The ROC Curve tells us how good our model is at trading off between benefit (True Positive Rate) and cost (False Positive Rate) – the greater the area under the curve is, the better the model.

The plot below shows the ROC Curve of our blackjack playing neural net – the neural net seems to be adding a fair bit of value over guessing randomly (the red dashed line). Its area under the curve, or AUC, of 0.73 is significantly higher than the AUC for random guessing (0.50).

ROC Curve for Our Blackjack Playing Neural Net
ROC Curve for Our Blackjack Playing Neural Net

I used my training data to plot the ROC Curve. Usually we would want to plot it using our validation or test data, but in this case we know that as long as our sample is big enough, then it is representative of the population (assuming we keep playing blackjack with the same rules). And we would expect our model to generalize well (any new data would have the same underlying statistical characteristics as our training data).


Time to Play!

Before our neural net can officially start gambling, we need to give it a decision rule. Remember that the sigmoid activation (from our final neural net layer) makes our neural network output a probability that the correct move is to hit. We need a decision rule, where given this probability, we decide whether to hit or stay.

I wrote the following function to do just that – the model_decision function takes in the features that the neural net requires, makes a prediction using those features, and compares that prediction to a predefined threshold in order to decide whether to hit or stay. I use 0.52 because we already know from last time that busting is the biggest risk to a blackjack player. Thus, using 0.52 as the cutoff for hitting makes our model slightly less likely to hit, and therefore slightly less likely to bust.

def model_decision(model, player_sum, has_ace, dealer_card_num):
    input_array = np.array([player_sum, 0, has_ace,
                            dealer_card_num]).reshape(1,-1)
    predict_correct = model.predict(input_array)
    if predict_correct >= 0.52:
        return 1
    else:
        return 0

Now we just need to add the above function to our code where we decide whether or not to hit (please refer to my GitHub if you are curious how I coded this part). So when it comes time to decide what to do, the neural net will make its decision based on the card that the dealer is showing, the total hand value of its own cards, and whether or not it is holding an ace.


Our Model is Pretty Good!

Finally, let’s compare our neural net’s performance with both the naive strategy and the random one. To remind everyone:

  • I ran approximately 300,000 blackjack simulations for each strategy type (neural net, naive, and random).
  • The naive strategy is to only hit when there is zero chance of busting (hit for hand totals below 12, and stay for hand totals of 12 or more).
  • The random strategy is to flip a coin – if it comes up heads hit, otherwise stay. And if you hit and don’t bust, then flip the coin again and start the entire process over again.

Let’s see if our neural net was able to find a better strategy. The following table shows the outcome distribution for each strategy type. Two things jump out to me. First, our neural net only lost slightly less than half (49%) the games it played. While I wouldn’t call that beating the house, it’s pretty decent for a game where the odds are fixed against you. Second, it actually doesn’t win more often than the naive strategy – rather it is able to force more ties.

Outcome Breakdown by Strategy
Outcome Breakdown by Strategy

We can also take a look at how the strategies perform across our key features (dealer card and player hand total). First, let’s check out the impact of the dealer’s shown card on the probability of winning or tying for our three strategies. In the plot below, if the dealer is showing a low card, our neural network performs about as well as the naive strategy. But when the dealer is showing a higher card (7 or more), our neural net performs significantly better.

Probability of Tie or Win vs. Dealer's Shown Card (Taller Bars are Better!)
Probability of Tie or Win vs. Dealer’s Shown Card (Taller Bars are Better!)

We can take a look at how the probability of winning or tying varies with the player’s initial hand total as well. This looks pretty promising – our neural net performs as well or better across the board. And unlike the naive strategy, which performs even worse than random guessing in The Valley of Despair (player hand values between 12 and 16), our neural network performs better.

Probability of Tie or Win vs. Player's Initial Hand Value (Taller Bars are Better!)
Probability of Tie or Win vs. Player’s Initial Hand Value (Taller Bars are Better!)

The most recent plot hints at how the neural net is able to surpass the naive strategy. The naive strategy (because of how we coded it) is unwilling to take a chance any time that there is even a remote risk of busting. The neural net, on the other hand regularly hits on 12s, 13s, 14s, or 15s. It’s more nuanced decision making and ability to take on calculated risks seems to set it apart from the naive strategy.

Tendency to Hit of Neural Net and Naive Strategy vs. Player's Initial Hand Value
Tendency to Hit of Neural Net and Naive Strategy vs. Player’s Initial Hand Value

We can take a look at what the neural net does when the player’s hand totals to be between 12 and 16 to try to improve our naive strategy (and not lose as much money to the casino).

It looks like there is a strong preference to hit when the dealer is showing a high card (8, 9, or 10). But even when the dealer is showing a low card like 3, the neural net still chooses to hit 60% of the time – this is because the neural net is taking into account all the features that it has at its disposal when making a decision. So it looks like we can’t easily distill its decisions into a few simple rules of thumb.

Neural Net's Frequency of Hitting vs. Dealer's Shown Card
Neural Net’s Frequency of Hitting vs. Dealer’s Shown Card

Conclusion

Hopefully this post gave you a decent introduction of how Machine Learning can be used aid real-life decision making. Here are a few things to keep in mind when you are training your own models (whether they be decisions trees, regressions, or neural nets):

  • Is my target variable structured in a way such that if I can predict it, then I can solve my problem? Before you start gathering data and building your model, it’s critical to make sure that you are predicting the right thing.
  • How might new data vary from the data that I have trained on? If it might vary greatly, then a statistical model might not even be the right answer to your problem. And at the very least you must be cognizant of that and build in safeguards such as regularization and rigorous (as well as honest) validation and test set benchmarking of your model.
  • Not being able to understand how the model arrives at its decisions gives you no avenues for comprehending and sanity checking your model’s decision making outside of rigorous testing with test data that was held out of the model training process.

Finally a last word on blackjack. I probably won’t write about gambling again for a while (there are too many other topics I would like to explore). But if someone were interested in moving forward with or without my code, here are a few potentially interesting extensions to this project:

  1. Try to improve the model either through a more optimized neural network structure, or adding code for splitting aces (I didn’t build this into my original simulator), or choosing better features than the basic ones that I used.
  2. Give the model the ability to count cards and see how that impacts its performance for both the one deck case and the six deck case (this is the Vegas standard).

Hope you found this as interesting as I did. Cheers!


A selection of my recent posts that I hope you will check out:

Making Data Science Interviews Better

Is Your Company Truly Data Driven?

Are Data Scientists at Risk of Automation

How Much Do Data Scientists Make?

How Much Do Data Scientists Make Part 2

How Much Do Software Engineers Make?


Related Articles