
As an avid pool player with a background in optical physics, I’ve often wondered if my experience with modelling laser beams paths improved my ability with aligning pool shots, or if it was the other way around!
The COVID-19 lockdown coincided with my career transition from optical physics to Data Science and also the hiatus of league games. While being unable to visit pool halls to play the game I love, I have instead spent a lot of time pondering how data could guide game strategy for pool games.
Many national pool leagues keep well-documented player statistics and records of past matches, providing an ideal resource for building predictive models. During the Insight Data Science fellowship I learned how to use Flask in Python to integrate machine learning models into user-friendly web applications. So, I wondered: could the powerful combination of Flask and machine learning be used to create a personal pool coach?
In this three-part series I will describe how I developed Magic8Ball Billiards, an application which helps your pool team maximize its chance of victory.
- In Part 1, which you are reading now, I will introduce the problem in greater detail, and show how Machine Learning can be applied to predict the outcomes of pool games.
- In Part 2, I will show how SQL queries can help improve player selection strategies.
- In Part 3, the performance of the app is statistically evaluated and compared to real-world strategies.
The Pool Problem
Pool, or pocket billiards, is the name for a family of games which share the common goal of depositing balls into holes using a cue stick. Games are played on a billiard table, which is a cloth-covered rectangular surface with 6 holes, or pockets. At the amateur level, the most frequently played version of pool is eight-ball, to the extent that the phrase "a game of pool" is synonymous with a game of eight-ball.
In eight-ball there are 16 balls on the table:
- 7 solid colors numbered 1–7.
- 7 striped colors numbered 9–15.
- A solid white cue ball.
- A solid black 8-ball.
The solids and stripes are two different suits; each player is assigned to a suit (depending on the first ball pocketed), and attempts to pocket all 7 balls in their suit before pocketing the 8-ball for the win.

Individual pool is a game of technical skill and strategy in equal measures. The number of different shot situations that can arise is essentially infinite, and there is rich academic literature describing artificial intelligence approaches for modelling billiard games.
In team pool, a whole new type of strategy arises: choosing the player lineups. A pool team consists of a cohort of players/friends who gather regularly to blow off some steam in a playfully competitive environment. Teams battle it out to win the most points to finish at the top of their local league. As any team captain will eagerly inform you, decisions made off the table are just as important as decisions made on the table. Matches can be won and lost entirely due to smart (or foolish) lineup choices made by the team captains.
The highest performing teams are rewarded with cash prizes and entry into the National Amateur Championships in Las Vegas/New Orleans – a major motivator! Furthermore, new and casual players can compete (stand a fair chance of winning) in the same league as seasoned grandmasters thanks to equalizing handicap systems (more on this later) which have effectively promoted inclusion. Considering these factors, match outcomes from these games can be surprisingly unpredictable.
Some pool jargon and definitions before we begin:
– Team: A group of 5–8 affiliated pool players.
– Team Captain: The player responsible for selecting which players play in each round.
– Skill level: A number assigned to a player by a league operator to describe their playing ability. Recalculated by an Elo algorithm after each match result.
– Round: A single head-to-head pairing between two players from opposing teams.
– Match: An event of one pool team playing against another team. Most commonly, five selected players from one team compete against five selected players from the opposing team, with each selected player playing only once. Scores from each round are tallied to produce the overall match score. All five rounds are played even if the overall match result has already been mathematically decided.
– Rack: A single ‘game’ of pool, commencing with a break-off shot and concluding when the 8 ball is sunk. Each player has to win a certain number of racks in order to win their round.
– Race: The number of racks required to win a round. This number or race is not the same for each player and is calculated using the players’ skill levels using the handicap algorithm.
– Lineup: The final list detailing which players were selected in each round. Note the order of pairings doesn’t affect the outcome.
Match rules
Before the two teams begin a match, a coin is tossed, and the winning Captain decides which team selects a player first in round one. After the first round, Team Captains alternate in the role of selecting a player first, i.e. match selection events could unfold as follows:
- Team A selects a player for round one.
- Team B selects a player for round one.
- Team B selects a player for round two.
- Team A selects a player for round two.
- … etc. until all rounds are complete.
The equalizing handicap
Although a team captain will know the abilities of their players very well, the decision making process is heavily clouded by the equalizing handicap. Let’s explain how this works through an example player pairing, which will help put into context some of the terms and definitions you’ve just read about:

Furious Steve has a skill rating of 110, meaning he is a top-level amateur player. Meanwhile Ambling Ally is just here for a good time, and their skill level of 50 reflects the fact they don’t take the game too seriously. The equalizing handicap levels the playing field so that if these two were to face each other in a round, each would have approximately a 50% chance of winning. It is essentially a conversion formula which takes player skill levels as inputs and yields the race length for each player. In the league I participate in, Steve would require 8 racks to win the round, whereas Ally would only need to win 2. It is not a perfect system, however, since there are many more factors that influence match outcomes than just player skill levels (e.g. worn-out equipment).
Most leagues keep records of each match that is played, including player skill levels, race lengths, and final scores – a boon for strategists.
The solution
In this article I will describe how a simple machine learning model can be applied to these data to create an app which can guide player selection choices, and maximize our team’s chances of that lucrative Vegas trip.
For readers of a mathematical orientation, this is an example of competitive subset selection, a topic which has been explored in rigorous detail.

The Magic8Ball app informs the user of which player is the optimal choice to select in each round.
It consists of the following components:
- A predictive model to generate the Probability that one player will triumph over another player.
- A database containing predicted match outcomes for every possible player lineup permutation.
- A player selection algorithm to guide the user to the lineup permutation with the highest match winning probability.
- A Flask application which implements all of the above in a user-friendly interface.
Figure 1 shows an overview of the Magic8Ball app. Throughout this series I will focus on the most common league format of 5 player teams. This means that there are 5! = 120 possible match lineups, some of which are much more likely to result in a win for one team. The outcomes of each lineup are predicted by a machine learning algorithm and stored in a SQL database. At the start of each round, the optimal player choice is calculated using these predictions. As the Captain chooses players to face off in the subsequent four rounds, fewer lineups are left to evaluate, and the match winning probability steadily increases.
Choosing features for a predictive model
In order to provide meaningful player suggestions for a given round, we need a way of quantitatively comparing different player pairings. We will build a predictive model using the results of 20,000 historical player pairings from the North American Poolshooter’s Association (NAPA).
To view any of the code used in this particular section, see the corresponding file in my GitHub repository.
NAPA publishes up to the minute match results in an easily readable tabular format. Match statistics can be expressed in the form of margins, i.e. the value of that statistic for player B subtracted from the value of that statistic for player A. Here are some of the features that are available for each completed round:
- Race Margin: The difference between the number of racks required by each player to win the round.
- Win % Margin: The difference between the historical round win percentages of each player (from prior matches):

- Skill Margin: The difference between the skill levels of each player.
- Game Margin: The difference between the total number of rounds each player has played during their membership of NAPA.
- Average Points-Per-Match Margin (AvgPPM Margin): The difference between the average points per round for each player.
Finally, and most importantly…
- Win Margin: The winning margin (in number of racks) for player A in the current round. Negative values indicate that player B won the round. In equation form, the win margin for player A is given by:

Our goal is to use the first five features to predict the Win Margin for that player pairing.


What insights can we extract from these historical results? A good place to start would be to examine the data distributions plotted in Figures 2 and 3 and perform a basic correlation analysis of the dataset to find out if any variables exhibit strong relationships.

Figure 4 shows the Pearson correlation coefficient matrix for each of the features in our dataset of 20,000 historical rounds. Some observations that immediately stand out:
- Win % Margin and AvgPPM Margin exhibit perfect multicollinearity! This is pretty obvious because more points are awarded for a win than a loss. However, this is not good news for models such as linear regression, so we will need to drop one of these features before fitting a model.
- There is also a high coefficient of correlation between Skill Margin and Race Margin. This makes sense, since the race margin is calculated using the player skill levels. However, the race calculator is not a continuous mathematical function, so this correlation is less than perfect.
- We only see weak correlations with the target feature, Win Margin, with the highest coefficient being 0.2 for Win % Margin. While this is good news for the designers of the equalizing handicap, it bodes ill for our prospects of developing a highly informative model. Without the handicap, we would expect a much stronger correlation between Skill Level and Win Margin.
Choosing a predictive model
We want to train a model which uses the Race Margin, Game Margin, Win % Margin, Skill Margin, and AvgPPM Margin variables to predict the value of the Win Margin variable.
1. Linear Regression
Since each feature is approximately normally distributed, linear regression is a suitable starting model. With a linear regression model and our features, we could potentially predict the margins by which players could win in a given round.
We already saw that the AvgPPM Margin variable was redundant due to its perfect correlation with Win % Margin, and we will also drop the Game Margin variable since its magnitude of influence is so small. Using the remaining features, the data is split into training and validation sets.
I scaled the training data and then fit an ordinary least-squares regression model. Performance is quantified using the mean absolute error (MAE) and coefficient of correlation (R squared) between predictions and ground truth values in the validation dataset. The regression model predicts the winning margin for player A in terms of number of racks.

The model predictions are visualized in Figure 5 for the training and test datasets. From these scatterplots, we can see that the model clearly has a lot of variance! The mean absolute error on each prediction is approximately 2 racks, which is pretty uninformative given that out of all results in the dataset, 63% were won with a margin of 2 racks or less (see Figure 3). So, with this model, there is a very real possibility that we could predict the incorrect winner more often than not. Not exactly the kind of model to feel confident placing your trust in.
2. Classification
Considering the noisy data we’ve seen thus far, perhaps predicting the win margin for individual rounds isn’t the most informative or practical. Instead, it would be more useful to make probabilistic predictions about how individual rounds will play out. By weighing up the winning probabilities of possible player pairings, we could make informed strategic decisions with our player selections.
We can accomplish this by creating a new binary win/loss variable and converting our regression problem into a classification problem. Classification models generate a probability that a sample belongs to a class, in our case, a win or a loss for player A. We can expect a win if the predicted probability exceeds 0.5, and a loss for a probability less than 0.5. Win margins greater than zero (a win for player A) are assigned a value of 1, while win margins less than zero (a win for player B) are assigned a value of 0.
After adding the binary win/loss feature, I compared three classifications models to predict player wins: logistic regression, a linear support vector classifier (SVC), and a Naïve Bayes classifier. Some performance metrics for each classifier are shown below.

At first glance it seems that there is nothing separating the performance of the Logistic and SVC classifiers in terms of their ability to distinguish between predicted wins and predicted losses for player A.
However, we are more interested in knowing the reliability of the predicted probabilities, rather than simply measuring the number of times the predicted probabilities led to a correct win or loss prediction. For example, in all rounds where the model predicted a 60% chance of player A winning, did player A actually win 60% of the time?
We can measure this by plotting a calibration curve (also known as a reliability curve). This allows us to compare the predicted probabilities of player A winning to the actual fractions of wins for player A.
The data points in Figure 6 are produced by discretizing the [0,1] probability interval into 10 bins and measuring the fraction of actual player A wins contained in that interval.

We can see that probability predictions from the Logistic Regression model are almost perfectly calibrated to the true fraction of player A wins. The mean absolute deviation from the dotted line is 0.01 and varies very little across the probability range. This means that we can be confident that probability predictions from the logistic regression model are a realistic representation of the likelihood that a player will win a given round. However, something to consider is that isotonic regression could improve the calibration of SVC and Naïve Bayes models.
Now that we have a model that performs reasonably well in making predictions about player wins with logistic regression, in Part 2, we will see how this logistic regression model can actually be applied to inform player selection decisions!
Github repository for this project.
Illustrations were provided by the talented @stevenfritters.