The world’s leading publication for data science, AI, and ML professionals.

Building Venue-Adjusted RAPM for Expected Goals: The Origin, the Process and the Results (Part 3)

The Results: How big a deal is scorekeeper bias?

After writing over 7,000 total words for parts one and two of this article, I’m excited to finally share the results. If you’ve been reading along so far, I’m sure you are too, and I’d like to thank you for taking the time to read my work. I hope you’ve enjoyed it.

If you haven’t been reading along, that’s okay. I do highly recommend that you read part one and part two before you read this; they aren’t short, but they will give you a good idea of why this is all important to me and why you should care about it too. But if you’re not interested in doing all of that reading, and you just want to see the results, I don’t blame you and I won’t stop you from reading this. I will, however, provide you with a quick review of what I covered in parts one and two:

  1. Minnesota’s scorekeepers erroneously report that shots were taken further from the net than they actually were, which leads to the defensive performance of their skaters being overrated and their goaltenders being underrated by highly effective models such as Evolving Hockey‘s Goals Above Replacement. Minnesota’s scorekeepers are also not the only ones who exhibit this pattern of behavior which I refer to as "scorekeeper bias."
  2. I did not aim to "fix" this issue, but rather to provide an estimate of the inaccuracy which it causes. I did this by building an expected goal model, implementing a venue adjustment to shot distance which improved the performance of my expected goal model, and building a RAPM model which provides a point estimate of a skater’s isolated impact.

For the results, I will start my analysis at the team level, then move on to goaltenders, and finish with individual skaters.

At the team level, I will start with defense. Here is every team’s defensive performance over the past two seasons, measured by expected goals against per hour (xGA/60), before and after venue adjustments:

Image by TopDownHockey
Image by TopDownHockey

Since Minnesota has been at the forefront since the start of this article, let’s focus on them. As you can see, they were the team whose defense was penalized hardest by this adjustment. Prior to the adjustment, their xGA/60 of 2.52 was far and away the league’s best, with 2nd place Columbus sitting comfortably in their rear view at 2.66 xGA/60. After the adjustment, Minnesota fell to 6th place in xGA/60 with a mark of 2.75. They’re still damn good defensively, but no more are they far and away the best defensive team.

On the other end of the spectrum, Chicago improved to about the same degree that Minnesota declined. They were the worst defensive team prior to the adjustment and the 4th worst defensive team after the adjustment. Anaheim and the New York Rangers also saw major improvements, while Montreal and Philadelphia saw their defensive numbers decline to a similar degree. Six other teams saw their xGA/60 rate change by at least 0.1, with three of them seeing improvement and three of them seeing a decline.

What I find very interesting here is the symmetry of this pattern. You have one team at each end of the spectrum who saw extreme changes that were identical to one another, another two teams at each end whose slightly less extreme changes were virtually identical, and another three teams at each end of the spectrum whose non-extreme but still significant changes were virtually the same. Let’s see if the same pattern persisted on offense, which is measured through expected goals for per hour (xGF/60) with data over the same two-year sample:

Image by TopDownHockey
Image by TopDownHockey

Team offense didn’t show quite as much symmetry as team defense, but the pattern was similar: teams like Minnesota and Montreal whose scorekeepers report that shots are further from the net than they actually are saw a boost to their offense, while teams like Chicago and Anaheim saw their offense decline.

The one thing that surprised me here was that Minnesota’s offense improved by a notably larger degree than their defense declined. This means that their expected goal differential saw a notable improvement. Were they an outlier, or did this happen to many different teams? Let’s find out by looking at plus/minus, measured by expected goal differential per hour (xG+/-/60) over this same two-year sample:

Image by TopDownHockey
Image by TopDownHockey

It turns out that Minnesota was the only team who saw a real significant increase in their expected goal differential rate. Montreal’s change was enough to raise an eyebrow, but everybody else was well within the margin of error. Why did Minnesota improve? I’m not entirely sure, but I do have one theory: Minnesota is a good possession team at home with a Fenwick (unblocked shot attempt) for percentage of 51.93%. The value of shots they take and allow is increased by the adjustment, but because there are more home shots, this gives them a bump overall. This theory would also apply to Montreal and Boston, the teams who saw the second and third biggest improvements in their expected goal differential rates.

Their home Fenwick for percentages are 54.74% and 53.71% respectively, and their shots for and against at home become slightly more valuable after my adjustment. This is just a theory though, and I would need to do more research to confirm with certainty why this is. It’s also not too big of a deal, because no team sees their expected goal differential rate change by a massive amount.

To get a better idea of how these adjustments impacted the league wide ranking of teams that I made heavy adjustments to, check out the following visualizations:

Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey

As you can see, a good team is generally still going to be a good team and a bad team will still generally be a bad team. But the degree to which they are good at different things will vary notably for teams with serious cases of scorekeeper bias.


Now it’s time to move on to the goaltenders. For the visualizations, I chose to use goals saved above expectation (GSAx) in place of delta Fenwick save percentage (dfSV%) or another per-shot/per-minute metric because I wanted to provide an idea of how significantly a goaltender’s aggregated metrics had been influenced. If I used a per-shot metric, an irrelevant goaltender who played a small sample size worth of games with a large percentage of them at the Xcel Energy Center or Madison Square Garden may have seen a wildly inflated "adjustment impact."

For my first visualization I set an arbitrary minimum of 75 games played. I did this because I knew I needed to set some kind of minimum since including every goaltender would create roughly one hundred data points – too many to display – and I figured that every team had played at least 150 games since the 2019–2020 season, so a goaltender would need to have played at least half of his team’s games in order to meet the minimum threshold. Here are the goaltenders who met my minimum threshold, ranked by how the venue adjustment impacted their GSAx:

Image by TopDownHockey
Image by TopDownHockey

Dubnyk, Price, and Gibson are all massively impacted by this adjustment. Dubnyk goes from by far the worst goaltender in the league to…a very bad goaltender who is better than a few. Carey Price goes from a below average goaltender to a top ten goaltender. John Gibson goes from one of the very best to slightly above average. Crawford, Lundqvist, Lehner, and Bishop also get hit hard. Lehner in particular is interesting because while he has played for three different teams over the past two seasons, the two that he played most of his minutes for are two whose home arena scorekeepers historically report that shots are closer to the net than they actually are, so it makes sense that he’d be heavily penalized by a venue adjustment.

After creating the last visualization, I double-checked and realized that Carter Hart was nowhere to be found. I could have sworn he was Philadelphia’s starter over the past two years, but it turned out had played exactly 74 games. He was one of the players most heavily impacted by my adjustments despite his small sample, so I hated that my visualization didn’t include him. Alex Stalock was also literally in the featured image of part one of this article, and he was also heavily impacted by my adjustment. There was no way that I could just forget these guys existed, so I created two more visualizations for the goaltenders to showcase the goaltenders who didn’t meet my games played minimum: one for the ten most positively impacted by venue adjustments, and one for the ten most negatively impacted. Here they are:

Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey

The takeaways here are pretty simple: venues like Madison Square Garden where Georgiev and Shesterkin play lead their goaltenders to be overrated by metrics that don’t account for venue, while venues like Wells Fargo Center where Carter Hart and Brian Elliott play lead their goaltenders to be underrated by metrics that don’t account for venue.

Since everybody loves rankings of the best and worst players, I feel I ought to present the top-15 and bottom-15 goaltenders by adjusted GSAx with no minimum threshold for games played. Here are the top-15:

Image by TopDownHockey
Image by TopDownHockey

And here are the bottom-15:

Image by TopDownHockey
Image by TopDownHockey

As you can see, this adjustment has a pretty significant impact on goaltenders like Devan Dubnyk and Alex Stalock who play their home games at Xcel Energy Center where scorekeepers report that shots were taken further from the net than they actually were. It also has a significant impact on goaltenders like Ben Bishop and Anton Khudobin who play their home games at American Airlines Center where scorekeepers report that shots were taken closer to the net than they actually were. But the adjustment is not so significant that it flips the world upside down and tells us that Minnesota’s goaltenders are good and Dallas’s goaltenders are bad. Both Bishop and Khudobin are still comfortably top-15 in GSAx, while Dubnyk is still comfortably bottom-15 and Stalock just barely misses the cut.

To get a better idea of how these adjustments impact a goaltender relative to their peers, check out the following visualizations:

Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey

Lastly, before I get into the skaters: I hereby crown Devan Dubnyk "Not the Worst Goaltender on the San Jose Sharks."


Remember, I used a weighted ridge regression to calculate a metric called Regularized Adjusted Plus-Minus (RAPM) which is a point estimate of a player’s isolated impact. I did this because Evolving Hockey’s goals above replacement uses a player’s isolated (through RAPM) impact on expected goals against as one of the main components for skater defense, and because most other components are closely correlated with RAPM.

The first thing that I’m going to do here is compare the results of my RAPM before venue adjustments to the results of Evolving Hockey’s RAPM model across all skaters who have played at least 200 minutes:

Image by TopDownHockey
Image by TopDownHockey

As you can see, the results of the two models were very similar. This was my goal all along: to build something like their model, but something that was my own so that I could compare my own models before and after venue adjustments. If I just built a model with venue adjustments and compared it to theirs, I wouldn’t know which discrepancies were caused by venue adjustments and which discrepancies were caused by model differences.

As I did with teams, I’m going to start my analysis of skaters by going over their defense. Before I post the numbers though, I should point out that this data roughly follows a normal distribution and the standard deviation for raw RAPM xGA/60 was 0.117 while the standard deviation for adjusted RAPM xGA/60 was 0.115. Now that I’ve cleared that up, here are the skaters whose defensive impact were impacted most negatively and most positively by this adjustment:

Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey

(Remember, a negative number is good, and an adjustment which decreases a player’s RAPM xGA/60 means that the adjustment flattered them.)

This is about in line with what we expected: skaters from Minnesota, Philadelphia, and Montreal tend to look a bit worse defensively after adjustment, while skaters from Anaheim, Chicago, and the New York Rangers tend to look a little better. But how much are they affected? Remember, the standard deviation here is roughly 0.116, so no player’s defensive impact changed by even one standard deviation, and only about a dozen saw their defensive impact change by half of one standard deviation in either direction. While teams like Anaheim and Minnesota look significantly different after making venue adjustment, none of their individual skaters see a massive change.

What about offense? The standard deviations for offense were slightly larger than those for defense: raw RAPM xGF/60 had a standard deviation of 0.122 and adjusted RAPM xGF/60 had a standard deviation of 0.121. Here are the players whose offensive impacts were most heavily impacted:

Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey

Remember how after I implemented venue adjustments, Minnesota saw their offense improve to a significantly larger degree than they saw their defense decline? Well, it only makes sense that their skaters saw by far the biggest offensive improvement of any team’s. It also makes sense that the only player who saw their offensive or defensive isolated impact change by at least one standard deviation, was Jared Spurgeon who plays for the Minnesota Wild, and that the change was an increase in his offensive impact.

What about plus/minus? Since scorekeeper bias goes both ways, shouldn’t most players wind up with a similar net impact before and after venue adjustments? We’ll get into that, but before we do, we need to establish the standard deviation in order to understand the magnitude of any adjustment. The standard deviation for raw expected goal differential rate was 0.170 and the standard deviation for adjusted expected goal differential rate was 0.169. This is considerably larger than the magnitude for offense or defense, which we should keep in mind.

Now that we’ve established that, let’s look at the players whose net impacts were most heavily impacted by venue adjustments:

Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey
Image by TopDownHockey

As you can see, most offensive adjustments are essentially balanced out by defensive adjustments; Jared Spurgeon’s adjustment was the only one that even came within one third of one standard deviation for net impact. So, it’s fair to say that scorekeeper bias doesn’t play a big role in a player’s net impact on expected goal differential, and if we were only looking at this metric, scorekeeper bias would be a non-issue.

Here is one more visualization to give you an idea of how a player’s impact relative to their peers can change before and after venue adjustments:

Image by TopDownHockey
Image by TopDownHockey

The above visualization is meant to hammer home the idea that venue adjustments can make significant changes to a player’s offensive impact, but their defensive impact will generally see changes of a similar magnitude in the opposite direction, so their net impact will be very similar. In the case of Jonas Brodin, it doesn’t really matter whether you use the raw or adjusted metrics: his net impact is very good.

The problems arise when we look at something other than net impact. As I mentioned above, certain metrics such as Evolving Hockey’s goals above replacement uses expected goals for defense, but actual goals for offense. If we look at the two columns in the middle of the Jonas Brodin chart above, it’s clear that he appears significantly weaker defensively after we adjust for scorekeeper bias. Again, this isn’t as big of a problem for expected goal differential because his offense is weighed down by scorekeeper bias to roughly the same degree that his defense is propped up, but if we use a different metric that isn’t prone to scorekeeper bias (such as goals), we will overrate his defensive impact with no penalty to his offense.

This also isn’t as big of a problem when we’re analyzing one individual skater. As mentioned above, only one skater saw their offensive or defensive impact change by one standard deviation and only a handful of other players were close. But when we analyze an entire team full of skaters, and the average skater is roughly one half of a standard deviation better defensively than they actually are, all of that inaccuracy added up can lead us to some fairly inaccurate conclusions. And remember, to whatever degree scorekeeper bias leads us to overrate a team’s defense, it leads us to are subsequently underrate their goaltending (and vice-versa).

Now that we’ve established all of this, where do we go from here? Well, that’s not up to me to decide. Personally, I still plan to use Evolving Hockey’s goals above replacement and RAPM models to evaluate NHL skaters and goaltenders, as well as other models that do not incorporate venue adjustments as they are still very effective in spite of scorekeeper bias. I also plan to cross-reference the results of my own models to determine whether or not scorekeeper bias is impacting the outputs of other models, and to what degree it is. But what’s important is now that I’ve studied this issue and done the work to determine how inaccurately it may lead me to evaluate certain players, I’m in a better position to approach these numbers with nuance and be wrong about hockey less frequently than I was before. I hope you feel the same way after reading my research.

If you’d like to discuss any of this with me in further detail, or you just want to see some of these visualizations with your favorite teams or players highlighted, feel free to reach out to me on Twitter at TopDownHockey.


Related Articles