Online Poker — When’s the Money?

The best time to play and how much it’s worth

Published in

Towards Data Science

15 min readJul 9, 2019

Photo by Michał Parzuchowski on Unsplash

When poker players share hands online for strategy discussion, they mark themselves as heroes and their opponents as villains. Whether this is to hide their identity or their opponents, this casual jargon reveals a fundamental truth about poker — it’s every player versus the world, if you need a hero, look no further than a mirror because no one else is coming to save you.

This is the reality I’ve lived for the last ten years. As a professional poker player, specialising in heads up tournaments (regular texas hold’em but with just two players), I’ve often found myself in protracted battles for online territory. I’ve made friendships, but they usually only last as long as they are mutually beneficial, which often amounts to little more than “We’re both pretty good, how about we don’t play each other?”

This world is second nature to me, but sometimes hard to convey with words alone. So to paint a clearer picture, I’ve applied the data science workflow to a years sample of game summaries (approx 200,000 with over 14,000 unique players, all from a single buy-in level), with the primary goal of answering the question ‘what’s the best (most profitable) time for a professional to play?’ This article looks at some of the processes and conclusions from that analysis.

Data wrangling

The first step was to wrangle my data. To do this I took the game summaries data and subset it for each unique player, using the aggregate player results from each subset table to populate a player results table. The game count table is obtained by resampling the time series at hourly intervals, taking the count for each hour.

Time Series EDA

With identifying the most profitable time to play as my primary goal, and the assumption that this would be related to game count per hour, I first did some EDA (exploratory data analysis) of the games per hour time series. A few things stood out:

Weekdays have higher traffic than weekends

Weekdays have higher average traffic than weekends, something that might surprise even the most seasoned players.

There are gaps in the data

Given the frequency distribution, 0 games per hour seems over-represented in the data. In this case, I already knew there were some holes in the data, but with this view, I get a better idea of the scale of these holes which will help inform any filling strategies.

The time series is seasonal

Here day 0 is Monday, 1 is Tuesday, etc. There is a clear daily seasonal component to the time series (and possibly weekly — that would account for the lower traffic on Sundays). Games per hour are reliably higher after midday.

Traffic spiked in December

There is a significant spike in traffic in December but with only twelve months in the sample, it’s not possible to know if this is a regular occurrence. Further research or EDA is needed to be in with a chance of identifying a cause.

Player EDA

Another factor that quite obviously impacts the profitability of any player is the skill level of their opponent. Although we have no chance of predicting the actions of individual players, an analysis of aggregate results could also help determine the best times to play.

This section of my analysis was undoubtedly the most engaging, with each plot prompting as many questions as it answered while revealing a few hidden truths along the way.

Returns on investment are bounded

Average ROI — return on investment, is the best metric for measuring a poker players skill. The validity of this metric as a measure of skill increases as the sample of games played increases. In this data all games had a buy-in of $51, therefore an ROI of 2% represents approximately $1 profit per game. There is also a sunk cost (to the website hosting the game) of $1 per game, this is priced in to the ROI.

Here we see that as a players game count increases their ROI is constrained to a narrower range (or more specifically, a much narrower range than -100% to 100%). In the poker community, the baseline strategy for heads up poker is considered to be pushing all in every hand without looking at your cards, which in this situation would yield an approx -45% ROI (over a significant sample). Above 50 games we see that the average ROI’s conform to this baseline, with the ROI range getting ever narrower as game count increases (an illustration that players need to exceed a certain threshold ROI to incentivise them to play so frequently).

While this plot is informative, it also hides a lot of data, as many of the points represent more than one player (sometimes over 1,000 players). To get a clearer idea of ROI in relation to game count we repeat the plot, this time grouping players by game count and taking the average of the average ROI’s.

The More you win, the Longer you Play

This is useful as an approximation of the ROI, and in turn skill level, of players with low game counts who’s results are subject to high level of variance. An obvious conclusion is that those who play the lowest number of games are the some of the worst players — although there is some bias in that someone who wins their first game is certain to be able to afford to play another game, which is not true of losers.

Those not familiar with poker often ask me if complete beginners are the hardest to play against because they are so unpredictable. Here you have my answer, no! A new player’s in-game actions may be unpredictable, but overall their strategy will be predictably bad, and they would be my first choice of opponent every day.

Notice also how the centre of the plot is very noisy. Here we have arbitrary groupings of 2–15 players with the groupings ROI’s varying wildly. This prompted further analysis of the variance in ROI associated with fixed win rates over different sample sizes. I concluded from this that I should be reluctant to draw strong conclusions about a players skill level based on their ROI when the sample of games is below 1,000.

Few make a reliable income

Here we see that a majority of players experience a nominal win/loss over their entire playing sample, with over 90% of players winning or losing less than $500. Of those remaining, a few are high earners and many lose a medium amount.

From these plots (and additional histograms) I saw that 99% of players have played less than 200 games over the year, of the remaining 1%, half fall into the 200–400 games played bin, and the final half falls into a 400–19000 games played bin. We saw in the previous plots that players with less than 400 games are predominantly losing players.

Here we begin to see the emergence of two classes of players, the majority, over 99%, are casual players with varying win rates, and a small minority, less than 1%, play significantly more often and can see sizable returns from their sample.

To explore the idea of classifying players further I separated players into two categories, those playing 1000 games or more (high volume players) and those playing less than 1000 games (low volume players).

Each game has two players, so from a players perspective, 200,000 games translate to 400,000 games.

The high volume players collectively win over $400,000 in the year, the low volume players collectively lose over $800,000 (the site nets a cool $400,000 for hosting these games).

What is most interesting is this internal play rates within each group. 17% of games played by top the high volume players are played against other high volume players, this number is just 10% for low volume players. This means that over 80% of games played include one high volume player and one low volume player.

This represents, clearer than I could have expected, the dynamics of the online heads up poker community. There are two classes of players, professional (high volume) and recreational (low volume), or as players call them, sharks and fishes. Sharks spend most of their time playing with (eating!) fish, fish rarely have an opportunity to eat other fish (before a shark comes along), and sometimes sharks battle it out in an attempt to protect their stock of fish.

This analogy is pervasive in online poker, and can inspire some interesting advertising campaigns …

Classifying players as professional or recreational (shark or fish) could be a valuable feature in determining the most profitable time to play. We have a few features available to make a blunt classification (ROI and game count), a deeper look at the internal play rate of high volume players may give some hint as to how we should use those features (for example by setting an ROI threshold for being considered a shark).

Shark vs Shark

To investigate the circumstances of the internal play rate for the high volume players I constructed a pairwise matrix of games played between each combination of the top 20 players by profit and top 20 by games played. The resulting set is 25 players, showing a high overlap between the two groups.

All players in the set have played over 2000 games. Despite this, they all play almost no games with each other. Such small game counts between the most frequent players, who are all (but one) winners, demonstrates that most winning players make an active effort to avoid playing with each other.

From this set of players, there is one player who plays a large number of games against the rest, despite such strong opponents their ROI is not much lower than the sunk cost (rake) associated with each game. This player is of a similar skill level to his opponents but experiences low returns due to the higher skill level of said opponents.

This dynamic is best illustrated with a network graph:

Each node represents a player, circle size is absolute ROI, blue indicates positive ROI, red negative.
Line strength is the number of games played between two players.
Node positions are generated with networkX springfield layout (so closeness also represents the number of games players between two players, but the layout is not deterministic i.e. it changes every time you plot it).

What really stands out in this plot is that its centre is populated by players with lower ROI’s (smaller nodes), a majority of the interactions within this group are between the weakest players. The stronger players have almost no interactions with each other, and very few interactions with the weaker players in the group.

Another way to visualise this is in terms of interactions with the red node. Here we can see the player represented by the red node has played very few games against players with an ROI above 5%, they are avoiding the stronger players!

Here we start to see the reality in the life of a professional heads up player….

Professional players make an active effort to avoid playing each other, this is an unwritten agreement between a group of players that precedes the sample of games in this study. To enter this group of professionals, players must prove themselves by playing existing professionals in the group — to improve their chances of success they focus their efforts on the weaker existing professionals. Players describe these groups as ‘cartels’, they exist at most buy-in levels and get smaller in size as the stakes go up.

And what incentivises the weaker professionals to defend their territory? It’s the strong players surrounding them! Imagine the red node is climbing out of a hole, the small blue nodes are trying to push him back down, knowing if they don’t try hard enough, they will be pushed down themselves by the stronger players surrounding them.

Unfortunately for these players, poker is not just about poker, it also involves effectively managing adversarial and mutually beneficial relationships with a view to managing the fish to shark ratio and in turn, their bottom lines. This dynamic is unique to heads up tournament poker because it is the only form of poker where you can’t avoid an opponent who wants to play you without forgoing playing the game entirely, because of this players can effectively hunt other players.

This probably goes a long way to explaining why my friendships in poker rarely last beyond the point of being mutually beneficial. Unlike tournament players who win or lose a small amount from many different players, making it hard to develop a grudge or adversaries, I have often been in the position where I have taken, or had taken from me, 5 figure sums from a single player, sometimes in a single day! And to be honest, I wouldn’t have it any other way.

But let’s get back on topic. Another conclusion I drew in this stage of the analysis, from reviewing the ROI’s of these players for games against other players in the group, is that when these players play the red node, it’s really just the host that makes money, as the skill gap is not large enough for either player to overcome the sunk cost from hosting the game.

Taking this back to the original question, what is the best (most profitable) time to play, if we can classify players as professional and recreational, shark and fish, we can remove instances of shark vs shark from the game count per hour time series, transforming it from games per hour to games per hour including at least one fish. By doing this we isolate the ‘profitable’ component of hourly traffic which we can use to forecast future profitable playing times. Another time series we can generate is the number of professionals online in an hour, which would represent an ‘unprofitable’ component of hourly traffic.

I also created an additional feature, the number of games played per hour online, for every player in the data. This showed that likely sharks generally had a higher game count per hour than fish, which could be useful when classifying edge cases.

Classifying players

Time to pull out some clustering algorithms? Unfortunately not! We expect less than 1% of the players to be professionals, this represents a severe class imbalance, which is hard enough for a supervised learning algorithm let alone an unsupervised one. I gave it a try but attempts to cluster this unlabelled data eventually ended with me manually reviewing the clusters and altering them where I considered them not reflective of my own estimations (based on my years of experience).

The final classification method I used was a decision tree I created myself, with all players being classified with just a few nodes, as follows:

This is quite a blunt classification scheme and undoubtedly a few players have been misclassified, but it will suffice for this analysis. I imagine if I were to create a forecasting tool it would give users the option to manually classify players, increasing accuracy as defined by them.

Reworking the time series

Now that we have classifications, we can give each game a score representing the number of professionals in the game. This can be used to create new time series, one for each combination of recreational and professional (i.e. pro vs pro, pro vs rec and rec vs rec).

Straight away we can see the spike in December traffic that we noticed earlier was caused by a temporary increase in professionals playing against each other. A small war broke out and it was short lived. We see that recreational players almost never play each other; that the rec vs rec line is uncharacteristically high in the early portion of the time series points to the possibility that some players have been misclassified (possibly those looking for a promotion who didn’t get in 1000 games before they gave up).

By taking the count of unique professionals playing in each hour, we can also derive the number of unique professionals online per hour time series. This is plotted below with the recreational traffic time series.

The best time to play

So if each game including one recreational player is a fish, and each professional online is a shark, we can work out the best playing time by determining which hours yield the most fish per shark (by dividing recreational games by professionals online).

But first a few alterations to the time series. The suspected missing values for both series are filled with the mean for that hour of the week. And for the professionals online series one is added to every observation, this changes the question answered from ‘how many fish did the pros online in that hour get?’ to ‘how many fish would an additional pro have got if they also played in that hour?’ which is more representative of the value proposition associated with deciding to play.

We don’t have enough years or months in the data to draw conclusions about the best time of the year or month to play, but we do have enough weeks. Below is a plot of the average number of games (including one recreational player) per professional for each hour of the week (remember: 0=Monday, 1=Tuesday, etc).

The daily seasonality we saw earlier in the ‘hourly game count throughout the week’ plot has disappeared almost entirely. It seems the collective self-interest of professionals works to regulate the ratio of shark to fish, keeping it between 4 and 5 throughout weekdays. Although at the start we saw weekdays have higher traffic, after midnight Friday and Saturday stand out as the best times to play, with Saturday and Sunday afternoon being the worst.

So what advice can we give players from this analysis? During the week, it doesn’t matter what time you play, you can allow other aspects of your life to influence your schedule comfortable in the knowledge you’re not missing out on any extra value. Yes, Friday and Saturday nights are better to play, but it’s far from double time — expect around a 25% increase in games played per hour. Avoid Saturday and Sunday afternoons.

And for the aspiring professional curious to know the value of playing at this buy-in level. Whatever time you play during the week, expect your average returns to be the same. We saw professionals with ROI’s of 2–8%, so with a $51 buy-in and 4.6 games per hour (for weekdays), expect to earn between $4.70 and $18.77 per hour depending on your skill level (buy-in*number of games*ROI).

So there we have it, the best time to play, or rather the knowledge that other than a few times around the weekend, there aren’t any adjustments that can be made to a player’s weekly schedule to improve their expected profit.

But what about players whose schedules are flexible? That can decide on the fly whether to start, keep or stop playing. Do we have any advice for these players? We do, but we’ll save that for another article, where I’ll take you through the process of creating a time series model that can be used to forecast future fish to shark ratios

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.