The world’s leading publication for data science, AI, and ML professionals.

Can we see an upset before it happens?- Predicting the madness of March

A new method of visualizing the uncertainty around predicting the winner of a matchup can help explain surprising upsets.

Can we see an upset before it happens?

Predicting the Madness of March

Image by Izuddin Helmi Adnan (@izuddinhelmi) | Unsplash Photo Community
Image by Izuddin Helmi Adnan (@izuddinhelmi) | Unsplash Photo Community

You’ve seen it before. A team with a higher seed puts together a solid game on offense and defense, defeating a lower seeded team everyone thought would win for sure. Your bracket of picks is destroyed and Cinderella moves on. Who could have predicted that upset? Maybe you, next year, after learning a new way to look inside predictive models in this article.

The internet is full of predictions for the tournament, giving you expected probabilities of one team beating another. They’ll tell you about the simulations they ran, the huge dataset feeding the model, and the sophistication of their model. But they give you only a single probability for each game. This simplicity is nice, but it is misleading. It is also the downfall of many brackets using these probabilities to choose a winner.

A probability derived from a model is a single point estimate. It alone does not give any information about how confident the model is about the prediction. Without an understanding of how confident we are in the estimate, we can’t know how sure we are that the model models uncertainty well.

In statistics, point estimates are scary things without a snug confidence interval wrapped around them.

I believe the solution is to look for the uncertainty in these models and find a way to visualize it. State-of-the-art models might incorporate deep-learning algorithms, or bayesian methods, but under their fancy hoods they all tend to rely upon the same basic inputs: estimates of a team’s offensive efficiency, defensive efficiency, and overall strength.

Models use different algorithms (math) to match these parameters to outcomes (i.e. winning), but in the end, what they are trying to do is estimate the difference between the true strength of each team. I believe I’ve thought of an easy way to visualize this difference which also gives us an estimate of the certainty around those point estimates we commonly come across. And all with some very simple math using frequency Statistics.

A New Model – Focusing on Visualizing Differences

To visualize estimates of a team’s true strength, I’ve gathered seven variables designed to estimate different parameters which are proxies for a team’s true ability. Four of these, two for offense, two for defense, estimate a team’s efficiency on both ends of the floor – Ken Pomeroy‘s Adjusted Offense/Defense and Massey’s Offense/Defense ratings. The other three take a more holistic view of the team, giving an estimate of overall strength or power – FiveThirtyEight’s ELO, Massey’s Power, and Massey’s Rating.

Each of these variables are ratings, not rankings, so they reflect true distances between teams on each measure (ordinal measures, like rankings are messy estimates of team’s capabilities). By standardizing each measure we get a z-score for each team on each measure which can then be directly compared. In other words, we have different estimates of a team’s strength which can now be averaged together for a single estimate of a team’s strength, on a unified scale. Because these measures can be directly compared, we can also calculate confidence intervals around this composite score for each team giving us the Holy Grail – a single estimate of each team’s strength with an idea of our certainty about this estimate. Voila. Now we have an idea of each team’s strength which also includes a range of possible values indicating how confident we are that we’ve captured the true ability of the team.

To accomplish this task, my model simply calculates the average of these seven standardized scores, plots this value as a point estimate of overall team ability and adds 95% confidence intervals around this estimate using the variability in each team’s data. In the figures below outlining the efficacy of this model on 2021 NCAA Tournament data, zero indicates an average team (among all 350+ Division I teams), with positive/negative values indicating above/below average. The values [-3,3] on the x-axis represent standard deviations above/below the mean. The data and code to create the analysis and plots are available on my GitHub. But for now, let’s see if thinking about uncertainty would help us find those upsets in the future by looking at the (recent) past.

Visualizing Uncertainty Around Predictions: Two Seeds versus Fifteen Seeds – Round One

Consider the four teams given a 2 seed in the 2021 Men’s NCAA Tournament. Models, such as FiveThirtyEight’s forecast, gave similar probabilities of victory to each of the number two seeds: Alabama (95.19%), Houston (96.40%), Iowa (94.39%), and Ohio State (94.38%). These roughly correspond to the historical average of a two seed winning (they win 94.28% of the time). They also closely matches the number of people picking Ohio State to win (95.2% of nearly 15 million brackets on ESPN.com). Yet Oral Roberts pulled off the dramatic upset in overtime. I wouldn’t argue these models are ‘wrong’, but perhaps their simplistic output hides relevant uncertainty which we should be looking at when making our picks.

While each sophisticated model gave similar estimates for each 2 seed to win, my estimates, with a sense of uncertainty, reveal a different picture. Instead of clear differences between each team, only two games look like runaway victories (no overlap either of the confidence intervals or the ratings inside them). However, Iowa seems to have a below average defense (0 is an average defense), which came to haunt them in the next round, and the Ohio State game looks interesting. In the Ohio State matchup it looks like Ohio State has a below average defense and they are matched with Oral Roberts, which has an average or above average offense. Note also the upper limit of Oral Roberts’ confidence interval is very close to the lower limit of that of Ohio State. This indicates that there is a possibility these teams are actually both very similar in ability in reality. For perspective, Alabama is estimated to be nearly two standard deviations better than Iona, indicating their probability of winning should be much higher than that given to Ohio State. Visualizing a point estimate of strength, with confidence intervals, reveals that, while we could not be certain of Oral Roberts’ victory, estimates of Ohio State at close to 100% are clearly too high. If anyone in this batch looked ripe for an upset it was Ohio State.

But maybe this is just a coincidence. Does this model of team strength and its uncertainty hold up to other games? Let’s check it against two other blocks of picks (3 and 4 seeds) and the second round of games.

Visualizing Uncertainty Around Predictions: Three Seeds versus Fourteen Seeds – Round One

This year, the surprise in these matchups was Texas losing to Abilene Christian. Team strength indicators are confident of their point estimate of Texas (all are located within a tight range slightly above average in every category). While Abilene Christian is likely a below average team (the full confidence interval sits below zero), they appear to have a high quality defense, which Texas can now attest was an issue after 23 turnovers against the Wildcats. Again, this form of visualizing/modeling teams would not predict Abilene Christian as the winner of this matchup, but it does appear to indicate the 84.59% probability of a Texas victory was too high for this matchup. Kansas and Arkansas each faced a team with a potent offense, but in this visualization it is easy to see that both had very strong defenses to counter those offenses.

Visualizing Uncertainty Around Predictions: Four Seeds versus Thirteen Seeds – Round One

The 4 seeds this year produced two upsets. Let’s see if there was an indication of this vulnerability we could have been made aware of. In the case of (4) Virginia and (13) Ohio there is a clear indication Virginia would not likely run away with this game as there is an overlap between the confidence intervals (indicating both teams may be equivalent). Additionally, one measure of Virginia’s offense rates it as a full standard deviation below average. An upset definitely looked plausible here. Further, (4) Purdue and (13) North Texas looks a lot like (3) Texas and (14) Abilene Christian. The estimates of Purdue’s true ability are tightly arranged just above average, while North Texas is average or below, but the defense of North Texas is nearly a standard deviation above average. Neither of the other matchups have as strong of overlap or a prominent ability on offense or defense which may pose a hazard to a specific opponent. If you were to pick upsets here it would have been those two games, another indication that this method of comparing teams has merit.

Visualizing Uncertainty Around Predictions: Second Round

In the second round the blowout of (3) Kansas by (6) USC should not have been a surprise, according to this model. Some of the other upsets, such as (8) Loyola-Chicago over (1) Illinois, (11) Syracuse over (3) West Virginia, or (7) Oregon over (2) Iowa, also look less surprising with these visualizations as these teams appear very evenly matched when considering the confidence intervals around their estimated strengths. Some underdogs even have a clear strength which is better than the favored opponent (Loyola-Chicago’s defense, Oral Roberts’ offense) or the favorite has a clear weakness (Iowa’s defense). Once again, this form of visualizing/modeling the data gives additional insight into predicting a team’s chance of winning beyond today’s standard models.

Concluding Thoughts

Understanding the certainty, or uncertainty, around predictions is important. Without knowledge of how confident you are in a prediction you may be surprised, and unsure of how to explain unexpected outcomes (like your broken bracket). For the NCAA Tournament, visualizing team differences with a standardized, aggregated point estimate with 95% confidence interval provides important additional knowledge about team matchups which can be used to identify favorites that may be more vulnerable than state-of-the-art models indicate.


Related Articles