NHL Equivalency and Prospect Projection Models: Building the NHL Equivalency Model (Part 2)

A Few Minor Updates on an Established Framework

Published in

Towards Data Science

20 min readJun 26, 2021

In Part 1 of NHL Equivalency and Prospect Projection Models, I made a few references to scoring outside the NHL. This would be an easy thing to measure if all prospects played in the same league. It would even be fine if they all played in a small handful of leagues that were similar enough to compare players in each of them, like the 3 leagues which make up the CHL. The analysis I referenced earlier compared scoring from defensemen in all 3 CHL leagues and treated those scoring rates as equal, and I didn’t have much of a problem with that because they’re all fairly similar.

Not all prospects come from the CHL or a directly comparable league, though. Take Tomas Hertl, San Jose’s 1st round selection one year prior to Mirco Mueller, as an example: Hertl spent the entirety of his draft year in the Czech Extraliga, the top men’s league in the Czech Republic, and scored 25 points in 38 games (0.66 P/GP). A forward who scored at that rate in any of the CHL leagues in their draft year might be in danger of being skipped entirely in the draft, or at least waiting until a later round to hear their name called. But Hertl was selected 17th overall. And unlike Mueller, selecting Hertl has made San Jose look very smart; his NHL career performance ranks 2nd in WAR and 3rd in points among his draft class, and he would certainly go much higher than 17th in a re-draft.

Hertl’s high draft selection and subsequent success at the NHL level shouldn’t come as a surprise to anybody because his scoring rate in his draft year was actually better than that of anybody else in his draft cohort. His raw point rates just didn’t reflect it because he played in a league where it was significantly tougher to score.

How do I know that Hertl played in a league where it was tougher to score? Just the fact that he played against grown men instead of teenagers isn’t enough to confirm this. After all, my beer league is full of grown men, but a team full of the worst teenagers in the OHL would still light our best team up if they went head-to-head. And while common sense may be enough to tell us that the Czech Extraliga is better than any CHL League, and that Hertl’s draft year scoring is better than that of a CHL forward who scored at the exact same rate, we can’t accurately say how much better it was, or compare to it to players who scored at different rates in other leagues without a measurement of how much good the Czech Extraliga and those other leagues are.

This underscores the need for an equivalency model: One which determines the value of a point in any given league around the world. I built mine out as an NHL Equivalency (NHLe) model on the scale of 1 NHL point, which means that if a league has an NHLe value of 0.5, one point in that league is worth 0.5 NHL points.

A “classic” NHLe model calculates the value of a point in one league by directly comparing how a set of players scored in one league to how those same players scored in the NHL, typically in the same year or the year immediately after. In “League Equivalencies,” one of the earliest published works on this subject, Gabriel Desjardins laid out his method:

To determine the quality of the AHL (or any other league), we can simply look at every player who spent year one in a minor league and year two in the NHL and compare their PPG averages. In other words, the league quality relative to the NHL is:

Image from League Equivalencies by Gabriel Desjardins

This methodology is very effective in its own right, but suffers from three issues:

The development which players undergo between year 1 and year 2 is effectively “baked in” to the model, which skews it in favor of leagues with younger players. If League A is exactly equal to League B in every manner except that League A is full of younger players, then it stands to reason that NHL imports from League A will undergo more development in between years 1 and 2 than those from League B, and thus, score more in year 2. This will lead the model to erroneously state that League A is more difficult to score in, when it really just has younger players who develop more before they see the NHL.
Using the average points per game in each league makes the data more susceptible to being skewed by extreme values and does not take sample size into account. A player who plays only one game in league A and one game in the NHL holds the same weight as a player who plays 82 games in both.
This method only works for leagues that directly produce NHLers in the following season. It is impossible to use this method to calculate an NHLe value for a league like the GTHL U16 where players literally cannot play in the NHL in the following season. In addition, while a handful of leagues do produce NHLers in the following season, many of them produce such a small amount that one outlier in either direction can heavily throw off the final estimate.

All three of these problems were remedied in one place when CJ Turtoro published Network NHLe, an excellent piece which I recommend you read. (As a side note, I cannot thank CJ enough for all the effort he put into publishing his initial work and the help he provided me with mine. People like CJ who not only possess the intelligence and domain knowledge to help others, but also the kindness and genuine passion required to do so, are what make the hockey analytics community an unstoppable army of nerds becoming wrong less frequently.)

CJ handled the first problem by using only transitions between players who played in two leagues in the same year. (I also did this and never seriously considered using multi-year transitions.)

He handled the second problem by dividing the total sum of points in a league by the total sum of games played.

The third problem — which I consider the biggest of the three — he handled with a network approach, using indirect paths between leagues to determine the relative strength of each of them. The following example uses fictional data with nice round numbers to explain how one indirect path is calculated:

100 players have played in League A and League B in the same year. They scored a total of 1,000 points in 1,000 games in League A (1.0 P/GP) and 500 points in 1,000 games in League B (0.5 P/GP). In order to determine the “League B Equivalency” of League A, we divide points per game in League B by points per game in League A, which works out to 0.5/1.0 = 0.5.
500 players have played in League B and the NHL in the same year. They scored a total of 1,000 points in 1,000 games in League B (1.0 P/GP) and 200 points in 1000 games in the NHL (0.2 P/GP). We follow the same methodology laid out above to calculate the NHL equivalency of League B: 0.2/1.0 = 0.2.
We know that League A has a “League B Equivalency” of 0.5 and League B has an NHL Equivalency of 0.2. In order to determine the NHL Equivalency of League A, we simply multiply these two values by one another: 0.5*0.2 = 0.1. This path states that a point in League A is worth 0.1 NHL points.

No players need to have ever played in League A and the NHL in the same season for this methodology to work. So long as there is a connecting league in between the two, an NHLe value can be calculated for League A.

Here’s an example of a real path that starts with the KHL and uses the AHL as a connector:

In this case, players who played in the KHL and the AHL in the same year scored at an aggregate rate that was 1.63 times higher in the AHL than the KHL. This means the “AHL equivalency” factor for the KHL here would be 1.63. Meanwhile, AHL players who played in the NHL in the same year scored at a rate in the NHL that was 0.38 times the rate they scored at in the AHL. Multiplying these two values together provides us with a value of 0.63, which is the “NHLe” for the KHL for this particular path.

Not all paths feature exactly one connecting league, though. Some paths feature none, and are still completely valid, such as the path of the KHL directly to the NHL:

Some, like the path from J18 Allsvenskan to the NHL, feature more than one connecting league:

The methodology for leagues with more than one connector remains the same: Calculate the conversion factor for each one league to the next and then multiply the values by one another. In this case, the conversion factor between J18 Allsvenskan and Division-1 (now known as HockeyEttan) is 0.34, the conversion factor between Division-1 and the SHL is 0.1, and the conversion factor between the SHL and the NHL is 0.33. Multiply these 3 values by performing 0.34*0.1*0.33 = 0.012 and the result states that the value of a point in J18 Allsvenskan is about 0.012 points in the NHL. (Note that these values are rounded, and if you manually perform these calculations by hand with the rounded values you will get a slightly different result.)

What I’ve shown so far is just the methodology for calculating one path. But unlike classic NHLe, where the only path is the single one with zero connectors, Network NHLe can have dozens if not hundreds of paths! This raises more questions:

How do we determine the top path to use?
Once we determine the top path, should we use only that one, or should we use more than one? If more, then how many?
Are all paths valid, or should we exclude paths with a negligible sample size of transitioning players?
Do we place different weights on different equivalency scores that we get from each path, or do we just average them all?

The answers to (some of) these questions are where my NHLe differs from Turtoro’s, who provided the following answers to each of them:

The top path will be that with the fewest connecting leagues (he refers to them as edges).
Roughly the top 5 paths will be used.
A path must feature at least 10 instances of transitioning players to be valid.
Paths are weighted by the following formula:

Weight = 1/2^(Connections) * MinimumInstances

Where Connections is the number of connections (including the final connection to the NHL) which make up a path, and MinimumInstances is the minimum number of transitioning players which make up an instance within the path.

The might look a bit tricky, so here’s an example of three paths being weighted for the KHL:

The first path here is KHL->AHL->NHL. This path has 2 connections. The KHL->AHL connection has 65 instances of a transition and the AHL->NHL connection has 2,876. This means the fewest instances in any connection for this path is 65, and calculation for the weight of this path is 1/2^(2) * 65 = 16.25 .
The second path is KHL-> NHL. This path has one connection with 26 instances of a transition, and the calculation is 1/2^(1) * 26 = 13 .
The third path is KHL->Belarus->WJC-20->NHL. This path has three connections. The KHL->Belarus connection has 130 instances, the Belarus->WJC-20 connection has 75, and the WJC-20->NHL connection has 56, which means the minimum instances for this path is 56. The calculation for the weight of this path is 1/2^(3) * 56 = 7 .

With NHLe values and weights for each path, the process for calculating the league’s final value becomes quite simple: Multiply each NHLe value by the weight of its path, take the sum of those outputs, and then divide them by the sum of the weights. In this case, the calculation is ((0.63*16.25)+(0.83*13)+(0.85*7))/(16.25 + 13 + 7) = 26.98/36.25 = 0.74 . If we were to use only these 3 paths, the final NHLe for the KHL would be 0.74.

I think every answer which CJ provided to these questions is backed by solid rationale. I might have answered these problems differently if I were starting from scratch, but my goal here wasn’t just to copy CJ’s process and make a few arbitrary decisions to change something I didn’t like. My goal was to put together a set of potential answers to these questions that all made sense to me, and then test each of these answers against one another and CJ’s, in the hopes of determining an optimal set of modeling parameters for building a Network NHLe model. Here are the different sets of parameters that I decided to test as an answer to each question:

For determining the top path(s):

Select the path which would have the highest weight according to the weighting equation laid out above.
Select the path with the fewest connecting leagues, using the highest weight according to the weighting equation as a tiebreaker.
Select the path with the highest minimum number of instances among all connections.

For determining the number of path(s) to use:

Test selecting only the single best path through selecting up to the top 15 best paths.

For excluding paths based on sample size:

Test selecting paths with a minimum of 1 instance through selecting paths with a minimum of 15 instances.

For weighting each path:

I simply stuck with CJ’s weighting formula here. I briefly tried out a few different methods, but his performed better in preliminary tests and just made too much sense.

Additionally, I chose to test out 3 other parameters:

Including the U20 and U18 World Junior Championships.
Changing the method of calculating a conversion factor between two leagues. I tested CJ’s method of using the full sum of scoring in each league, Desjardins’ method of using the mean of points per game in each league, and my own method (suggested by CJ) of using the median of points per game in each league.
Dropping the first connecting league in one path before creating a new path for that league. For example, if the KHL->AHL->NHL path is used, all other paths for the KHL from that point on may not use the AHL as the first connection. (The initial implementation of this rule was actually an accident caused by me misinterpreting CJ’s methodology, but I wound up keeping it as a test parameter because I thought it may be preferable not to weigh any one direct relationship too heavily for a given league.)

With a set of parameters that I all deemed eligible for building out the final model, it was time to test each of them and determine the best. I decided that the goal of my tests would be to minimize the mean absolute error between predicted points per game and actual points per game for all players who transitioned leagues. Predicted points per game was calculated based on the NHLe value for each league and points per game in the first of two leagues, with the NHLe values being those which were obtained after building the model with a given set of parameters.

I know I just threw a word salad at you, but the test process is actually quite simple; I’ll break it down using Melker Karlsson in 2010 as an example:

In this season, Melker scored 35 points in 27 games (1.3 P/GP) in superelit and 2 points in 36 games in the SHL (0.06 P/GP).
If the NHLe model obtained from a given set of test parameters stated that the NHLe value for the SHL was 0.53 and the value for superelit was 0.08, then we could obtain the conversion factor from superelit to the SHL by performing 0.08/0.53 = 0.15.
We would then multiply the conversion factor by his scoring rate in superelit and perform 0.15*1.3 = 0.2 , which would give us his projected scoring rate in the SHL. His actual scoring rate in the SHL was 0.06, and the absolute value of the difference between 0.06 and his projected scoring rate of 0.2 is 0.14, which would be the error obtained for this particular transition.
We would then repeat this process using his scoring in the SHL to predict his scoring in superelit, performing 0.53/0.08 = 6.625 to obtain the conversion factor from the SHL to superelit, and then performing 6.625 * 0.06 = 0.4 to obtain his predicted scoring rate in superelit.
Since his actual scoring rate in superelit was 1.3 P/GP, the error between this value and his predicted scoring rate of 0.4 would be 0.9. In conclusion, this one transition would give us two error values: 0.14 and 0.9.
The mean absolute error for a given NHLe model is the average of every error value obtained from players who transitioned leagues.

The dataset which I used to run these tests and build my model was all skaters who had played at least 5 games in any two or more of the 124 leagues I used in any single season from 2005–2006 through 2019–2020, with the exception of 2012–2013. (2012–2013 was removed entirely due to the shift in global quality of competition caused by the NHL lockout.) I couldn’t simply train my NHLe model using these sets of parameters on my entire data set once and then test it on the same data, though, as my goal was not to find the exact set of parameters that could best predict what has already happened; my goal was to build out the model that could best predict scoring out-of-sample. The way to train it to do so was to “practice” predicting scoring out-of-sample. I did this by randomly splitting my dataset into fifths and performed 5-fold cross-validation.

5-fold cross-validation may sound daunting, but it’s not all that scary. You begin with a training set which consists of 4/5ths of the league pairs in the dataset and a test set which contains the other 1/5th of the player pairs. For every single set of parameters which are fed, the NHLe model is built out on the training set and then tested on the test set. This process is then repeated four more times using the other groups of training sets and test sets, resulting in five different test values for each set of parameters. (Note that none of the test/train sets overlap; every single player pairing appears exactly once in one test set and none of the other four test sets, and that same player pairing appears once in the other 4 train sets for which it does not appear in the corresponding test set.) Here’s an example of what cross-fold validation would look like with five fake players, where red highlighting references the test set and green highlighting references the train set:

Cross-validation for each fold is performed using every single possible set of parameters to build the model out on the training set and test it on the test set. After cross-validation is completed for all 5 folds, the average test value is obtained for each set of parameters. The set of parameters with the best average test value (in this case, the lowest mean absolute error) is considered the optimal set of parameters for building an NHLe model.

The results of my 5-fold cross-validation determined the following set of parameters to be most optimal:

The top path available will be the one with the fewest edges, with total weight being used as a tiebreaker if two paths have the same number of edges.
Up to the top 11 paths will be used for a league.
Paths with a minimum of 8 instances of transitioning players are valid; all others will be discarded.
The World Juniors (both U18 and U20) will both be used.
The full sum of points will be divided by the full sum of games played in each league to determine the conversion factor between two leagues.
The first connecting league will be permanently dropped before any further paths are created.

The average of the mean absolute error across these five folds was 0.33. (I computed R² as well, which was also 0.33, but chose not to use it as a target parameter in training.)

With these parameters set, it was time to build out the model using the entire dataset this time around. These are the equivalency scores for every league for the final NHLe model:

╔══════════════════╦═══════╗
║      League      ║ NHLe  ║
╠══════════════════╬═══════╣
║ NHL              ║     1 ║
║ KHL              ║ 0.772 ║
║ Czech            ║ 0.583 ║
║ SHL              ║ 0.566 ║
║ NLA              ║ 0.459 ║
║ Liiga            ║ 0.441 ║
║ AHL              ║ 0.389 ║
║ DEL              ║ 0.352 ║
║ Allsvenskan      ║ 0.351 ║
║ VHL              ║ 0.328 ║
║ Slovakia         ║ 0.295 ║
║ EBEL             ║ 0.269 ║
║ WJC-20           ║ 0.269 ║
║ France           ║ 0.250 ║
║ Belarus          ║ 0.242 ║
║ Czech2           ║ 0.240 ║
║ EIHL             ║ 0.235 ║
║ LNAH             ║ 0.232 ║
║ DEL2             ║ 0.205 ║
║ Kazakhstan       ║ 0.201 ║
║ NCAA             ║ 0.194 ║
║ Denmark          ║ 0.190 ║
║ Mestis           ║ 0.178 ║
║ NLB              ║ 0.176 ║
║ Italy            ║ 0.176 ║
║ Norway           ║ 0.173 ║
║ ECHL             ║ 0.147 ║
║ OHL              ║ 0.144 ║
║ MHL              ║ 0.143 ║
║ USHL             ║ 0.143 ║
║ WHL              ║ 0.141 ║
║ Poland           ║ 0.135 ║
║ WJC-18           ║ 0.135 ║
║ Russia3          ║ 0.135 ║
║ Usports          ║ 0.125 ║
║ USDP             ║ 0.121 ║
║ QMJHL            ║ 0.113 ║
║ Division-1       ║ 0.109 ║
║ Czech3           ║ 0.104 ║
║ Erste-Liga       ║ 0.103 ║
║ Slovakia2        ║ 0.102 ║
║ Romania          ║ 0.099 ║
║ Superelit        ║ 0.091 ║
║ NAHL             ║ 0.087 ║
║ Germany3         ║ 0.085 ║
║ ALPSHL           ║ 0.084 ║
║ U20 SM-Liiga     ║ 0.083 ║
║ BCHL             ║ 0.080 ║
║ NMHL             ║ 0.076 ║
║ Czech-U20        ║ 0.074 ║
║ AJHL             ║ 0.062 ║
║ EJHL             ║ 0.060 ║
║ Czech U19        ║ 0.059 ║
║ SwissDiv1        ║ 0.054 ║
║ Belarus-Vysshaya ║ 0.052 ║
║ SJHL             ║ 0.052 ║
║ U20-Elit         ║ 0.049 ║
║ CCHL             ║ 0.048 ║
║ MJHL             ║ 0.046 ║
║ USPHL-Premier    ║ 0.046 ║
║ Slovakia-U20     ║ 0.044 ║
║ Russia-U17       ║ 0.044 ║
║ USPHL-18U        ║ 0.041 ║
║ U18 SM-Sarja     ║ 0.040 ║
║ NAPHL-18U        ║ 0.039 ║
║ Czech U18        ║ 0.038 ║
║ J18 Allsvenskan  ║ 0.038 ║
║ Division-2       ║ 0.038 ║
║ MJAHL            ║ 0.037 ║
║ QJAAAHL          ║ 0.036 ║
║ MPHL             ║ 0.035 ║
║ OJHL             ║ 0.034 ║
║ HPHL-16U         ║ 0.034 ║
║ Slovenia         ║ 0.033 ║
║ Russia-U18       ║ 0.032 ║
║ 16U-AAA          ║ 0.031 ║
║ J18-Elit         ║ 0.029 ║
║ USHS-Prep        ║ 0.028 ║
║ QMAAA            ║ 0.028 ║
║ CISAA            ║ 0.027 ║
║ Norway2          ║ 0.027 ║
║ USPHL-16U        ║ 0.027 ║
║ GOJHL            ║ 0.027 ║
║ AYHL-16U         ║ 0.026 ║
║ Russia-U16       ║ 0.025 ║
║ J20-Elit         ║ 0.024 ║
║ USHS-MN          ║ 0.024 ║
║ DNL              ║ 0.024 ║
║ Denmark2         ║ 0.023 ║
║ VIJHL            ║ 0.021 ║
║ NOJHL            ║ 0.021 ║
║ Slovakia-U18     ║ 0.020 ║
║ CAHS             ║ 0.020 ║
║ AMHL             ║ 0.020 ║
║ PIJHL            ║ 0.020 ║
║ KIJHL            ║ 0.020 ║
║ U17-Elit         ║ 0.018 ║
║ II-DivisioonA    ║ 0.018 ║
║ U20-Top          ║ 0.017 ║
║ BCMML            ║ 0.016 ║
║ U16 SM-Sarja     ║ 0.015 ║
║ NSMMHL           ║ 0.015 ║
║ Czech U16        ║ 0.014 ║
║ Denmark-U20      ║ 0.013 ║
║ MMHL             ║ 0.013 ║
║ U16 SM-Sarja-Q   ║ 0.012 ║
║ GTHL-U16         ║ 0.012 ║
║ J20-Div.1        ║ 0.011 ║
║ U16-SM           ║ 0.011 ║
║ U16-ELIT         ║ 0.010 ║
║ Alliance-U16     ║ 0.009 ║
║ GTHL-U18         ║ 0.008 ║
║ J18-Div.1        ║ 0.008 ║
║ Division-4       ║ 0.008 ║
║ QMEAA            ║ 0.007 ║
║ J20-Div.2        ║ 0.007 ║
║ Denmark-U17      ║ 0.006 ║
║ U16-Div.1        ║ 0.005 ║
║ J18-Div.2        ║ 0.005 ║
║ ETAHL U18        ║ 0.005 ║
║ AMMHL            ║ 0.005 ║
║ QBAAA            ║ 0.004 ║
║ AMBHL            ║ 0.002 ║
║ U16-Div.2        ║ 0.002 ║
╚══════════════════╩═══════╝

Note that a few of these leagues are merged with other leagues. For example, there was a league known as “Russia” in the EliteProspects database before the KHL came along which was effectively the KHL; I simply merged those two together as the KHL. The same goes for “Russia2” and the VHL and another league or two.

I alluded to this earlier when I mentioned that I had never seen Mirco Mueller play when the Sharks drafted him in 2013, but let me be very clear: I am not a prospect person. I’ve been a Sharks fan for my entire life, and I’ve closely followed the NHL for over a decade, but I’ve never paid close attention to any other league, which means I don’t have a strong idea of how this is supposed to look. My lack of domain expertise in the world of prospects comes with a few pros and cons:

Pro: I don’t hold any pre-conceived notions about certain leagues, so I’m not working to confirm any biases while building the model.
Con: Without an idea of how things are supposed to look, it’s much harder for me to identify bugs and errors in my code and then troubleshoot and fix them when I manage to do so. This issue has mostly to do with time spent on my end, as the final model is bug and error free to the best of my knowledge.
Pro: The knowledge that not only could the model be very bad, but that I would be fail to identify this with my naked eye if it were, motivated me to undertake a highly robust mathematical approach to ensure the model was legitimately good.
Con: It’s harder for me to understand and express the limitations of my work, especially from a practical standpoint. For example, I can look at the outputs of my WAR model and say “This overrates Mikko Rantanen and underrates Nathan MacKinnon because it generally overrates the finisher in duos like this, and I’ve watched enough of them to know MacKinnon is far superior.” It’s much harder for me to express (or even understand) why my NHLe model overrates European Men’s Leagues and underrates Junior Leagues even though that is my general impression.

In closing, I did a whole lot of work here just to make what I consider a few marginal upgrades on CJ’s NHLe model, which he himself said was “negligibly more accurate than the original version [classic NHLe].” I’m fine with this, because I learned a lot through the process and I now feel more confident in the model after validating that CJ already did pretty much everything right. While there are more things that can still be done to improve upon this work and build a superior model, I think it’s also fair to say we have most likely reached the point of diminishing returns.

I’m open to re-visiting the NHLe framework at a later date, and I suggest that anybody with ideas on how to improve upon my work dive in and give it a try. But for right now, I’m happy with the model I’ve put together, and I feel confident using it to make apples-to-apples comparisons between the scoring rates of two prospects in different leagues.

The next step, which I cover in part 3, is using these scoring rates to predict future success at the NHL level.