The world’s leading publication for data science, AI, and ML professionals.

A Comprehensive Guide to Using Regression (RAPM) to Evaluate NHL Skaters (With Source Code)

As I discussed in my Wins Above Replacement (WAR) write-up, I've used regression to obtain point estimates of an NHL player's individual…

Moving away from the black box…

As I discussed in my Wins Above Replacement (WAR) write-up, I’ve used regression to obtain point estimates of an NHL player’s individual impact on the following six components:

  • Even Strength Offense
  • Even Strength Defense
  • Power Play Offense
  • Penalty Kill Defense
  • On-Ice Penalty Differential
  • Individual Shooting

The regression isolates a player’s impact by accounting for various external factors that surround them. These factors differ depending on the component which I am evaluating. For even strength offense and defense, I account for the following components:

  • All teammates and opponents.
  • Whether a shift started on-the-fly as the result of an expired power play. This is by far the most important piece of external context that can shape a player’s result for a given shift. Ignoring this context is extremely unfair to penalty killers like Esa Lindell who start a large percentage of their shifts as the result of expiring enemy power plays, where "power play influence" is still present.
  • Whether a shift started with a faceoff in the offensive, defensive, neutral zone, or on-the-fly.
  • The score of the game.
  • Which team is at home.
  • The number of skaters on the ice for each team.
  • Which teams are playing the second halves of back-to-backs.

For shooters and goaltenders, I followed a similar but simpler process, using a binary "is goal" column as my target variable, with the shooter, goaltender, and expected goal value used as the predictor variables. This allowed me to determine the per-shot impact that a player or goaltender had on whether shots became goals.

For power play offense and penalty kill defense, I account for the exact same components, except instead of a variable denoting whether a shift started as the result of an expired power play, I use a variable denoting whether that power play began on-the-fly as the result of an expired penalty where the previous game state was even strength.

For penalties, things are a tad different. I run the entire regression using data from even strength and special teams, and account for the following variables:

  • All teammates and opponents.
  • In-game cumulative penalty differential. This variable has by far the largest influence on the rate at which teams take penalties. Game management is a real thing in the NHL, and if a team has taken more penalties than their opponents, they’re far more likely to receive make-up calls in the future, and far less likely to be further penalized.
  • Which team took the last penalty. This is also extremely important.
  • Whether a shift started with a faceoff in the offensive, defensive, neutral zone, or on-the-fly.
  • The score of the game.
  • Which team is at home.
  • The number of skaters on the ice for each team.
  • Which teams are playing the second halves of back-to-backs.

It’s very easy for me to say "I account for all of these factors." It would also be easy for me to say "Once you account for camera angles, I’m better looking than Brad Pitt." Both of these would rightfully be met with a heavy degree of skepticism from those who hear them, which is why I’ve decided to be as transparent as possible about my process of isolating skater impact.

I begin the process by building a dataframe which contains one row for each shift and one column denoting the presence of every skater and contextual factor as a dummy variable with a value of one or zero. I define "shifts" as all instances of play where the skaters on the ice, goaltenders, score, and period do not change. When any one of these variables changes, a new shift begins.

Here’s a partial view of what this dataframe for constructing even strength RAPM looks like:

Image by TopDownHockey
Image by TopDownHockey

This is a massive dataframe that holds roughly 600,000 rows and 1,800 variables depending on the season. In this example, not a single one of the visible dummy variables are present. To give a closer example, let’s look at just one shift that occurred in the third period of the second game of the 2018–2019 season. Here’s what you need to know about this shift:

  • The Washington Capitals played at home against the Boston Bruins.
  • The Washington Capitals held a lead of three or more goals.
  • This shift lasted 24 seconds.
  • This shift began with a faceoff in Boston’s offensive zone.
  • The Bruins took an unblocked shot with a 10.6% probability of scoring.
  • The Capitals took an unblocked shot with a 2.08% probability of scoring.
  • Boston’s five skaters were Brandon Carlo, Joakim Nordstrom, Chris Wagner, Noel Acciari, and Zdeno Chara.
  • Washington’s skaters were Andre Burakovsky, Brooks Orpik, Chandler Stephenson, Lars Eller, and Madison Bowey.

If we built a dataframe for RAPM using only this shift, here is what it would look like:

Image by TopDownHockey
Image by TopDownHockey

As you can see, there are two rows for one shift. The top row is from the perspective of the home team and the bottom row is from the perspective of the away team. If we move across to the end of the dataframe, we can see the skaters marked with ones and zeroes for defense:

Image by TopDownHockey
Image by TopDownHockey

The regression is run using every shift from the entire season. xGF/60 serves as the target variable and every other variable in the dataframe serves as a predictor variable. This regression, however, is not a typical linear regression; rather, it is a weighted ridge regression, where the length of each shift is used as a weight, and the "ridge" serves to shrink coefficients down to zero. Why do we need this ridge, which serves coefficients to zero? Because if we do not have it, we will have wildly inaccurate values. Here is a snippet of what RAPM results look like for the 2018–2019 season if I were to run a standard weighted linear regression without the ridge, with xGF/60 as the target variable:

Complete mess by author
Complete mess by author

This is purely adjusted plus-minus without the regularization. This is what NBA analysts used for some time before they discovered the magic of regularization. As you can see, the results are a bit of a mess; the top players are all nobodies who probably had strong on-ice numbers in very limited minutes, and the impacts are all far too high; not even Wayne Gretzky in his prime would improve his team’s hourly expected goal differential by over three. I do not believe anything on planet earth is more effective than the above chart at displaying the need for regularization.

With regularization, we implement a "penalty" to the dataset which minimizes the mean-squared error by shrinking coefficients towards zero. The method of regularization I’ve chosen is L2 (Tikhonov) Regularization, where every coefficient is shrunk towards zero, but never to exactly zero, since I want coefficient estimates for each player. L1 (Lasso) Regularization would not work here since it would drop some variables entirely. The penalty that I use comes in the form of a lambda value, which I obtain through cross validation. This plot shows us where the optimal lambda value is:

Image by TopDownHockey
Image by TopDownHockey

Once cross validation is run and the optimal lambda value is obtained, we run the regression once again using this lambda value as a penalty for every skater in the data set. The results, run on the same exact data set, look far more intuitive:

Image by TopDownHockey
Image by TopDownHockey

However, I still felt that the outputs were generally a tad too low, and that there was too much variation in year-to-year player impact. I did some research on NBA RAPM and found that this was not an uncommon sentiment; many basketball analysts also felt that one year of RAPM was somewhat unreliable, which is why they generally incorporate prior knowledge into their regressions. I actually made an account on an NBA analytics forum and spoke with Daniel Myers, the creator of the NBA’s Box Plus-Minus, who helped guide me through the process of creating prior-informed RAPM.

First, prior information for each shift is subtracted from the target variable. For example, if Connor McDavid is on offense and my prior knowledge tells us that his xGF/60 should be 0.5, I subtract 0.5 from the observed xGF/60 for that shift. Then I run the full regression – first obtaining lambda values through cross-validation and then running the actual regression. After this is complete, I add his prior value back to his observed value from the regression.

I initially tested using a player’s full estimated impact from the prior season as a prior and found that on a year-to-year basis, player impact estimates went from being too malleable to not being malleable enough. It was practically impossible for players who had a strong impact in year one to have a poor impact in year three or vice versa. I tested some things out and ultimately found a compromise between incorporating no prior information and too much prior information by calculating the linear trend between vanilla RAPM in year one and year two, and then using that trend to calculate a "predicted RAPM" which was used as a player’s proper prior. I calculated linear trends for both offense and defense and for both forwards and defensemen, as these components are not all equally repeatable. For example, the linear trend for forward offense was 0.008151 + (Prior)(0.446297), while the linear trend for forward defense was -0.003181 + (Prior)(0.280373) – in other words, forward defense is less repeatable than offense. So if a forward’s RAPM xGF/60 and xGA/60 in year one were both 0.5, then our hypothetical high-event forward’s priors for year two would be 0.231 xGF/60 and 0.137 xGA/60.

I also made a slight change to the degree that I shrunk each coefficient. For players who played at least 500 minutes at even strength and had a prior, I chose to use the shrinkage factor selected through cross validation. For players who played at least 500 minutes at even strength and did not have a prior, I chose to multiply the penalty by 0.75, in order to shrink the value a bit less and make a stronger estimate of that player’s impact. For players who played between 250 and 500 minutes, I chose to shrink their prior by a value of their ice time divided by 500. For players who played under 500 minutes, I chose to shrink the penalization by one half of the value selected through cross validation. (If a player with no prior had played a number of minutes between 250 and 500 where the penalization would be multiplied by more than 0.75, I just used 0.75 instead.)

I calculated prior-informed RAPM using a method known as a "daisy chain" where I chained a player’s impact in each season to their impact in the next season. For example, if a hypothetical forward had an xGF/60 of 0.5 in 2013–2014, I would use that to calculate their prior for 2014–2015. Then, I would use the prior-informed RAPM from 2014–2015 to calculate the prior for 2015–2016, and so on and so forth through 2019–2020. I began the process in 2013–2014.

I found that penalties were not as repeatable, and that they did not occur as frequently as shots, so I chose not to incorporate prior information into my penalty impact estimates. I made the same decision for my shooting and goaltending regression, where I simply did not feel comfortable incorporating prior information in the same manner. I do wish to re-visit the incorporation of prior information into these regressions at some point.

I have made source code for calculating vanilla even strength RAPM for the 2018–2019 season available on my Github. I wish to be as transparent and honest about this process as possible, so if there is anything that you do not understand or any details that you are uneasy about, please do not hesitate to reach out to me on Twitter or through Medium. If anything here is a black box, then that is a failure on my behalf, as none of it should be.


Written By

Topics:

Related Articles