The world’s leading publication for data science, AI, and ML professionals.

Fantasy Football Data Analysis with Python

With the 2020-2021 NFL fantasy football season about to come to a close, I was inspired to analyze data from the past few years:

Analyzing the relationship between players’ fantasy production and defenses

Photo by Eternal Seconds on Unsplash
Photo by Eternal Seconds on Unsplash

With the 2020–2021 NFL Fantasy Football season about to come to a close, I was inspired to analyze data from the past few years:

Before I start jumping into the data analysis, here’s a summary of fantasy football so we are all on the same page:

To play fantasy football, you need to create or join a league on one of many websites (ESPN, Yahoo, Sleeper, CBS, NFL, etc). Each member of the league is the owner/general manager of their team. Each league has two parts to a roster: the starting lineup and the bench. The starting lineup includes a combination of quarterbacks (QB), running Backs (RB), wide Receivers (WR), tight ends (TE), flexes (FLEX), kickers (K), and defenses (D/ST) based on league settings. The bench can be made up of any players the owner chooses.

There are 3 types of leagues:

  • Redraft Leagues: each season, rosters completely reset and all players are available to draft
  • Keeper Leagues: each season, owners can keep a certain number of players for the following season, and then draft the rest of their roster
  • Dynasty Leagues: each season, owners keep their entire roster for the next season and draft rookies in the draft

There are 2 types of drafts:

  • Snake (Traditional) Drafts: each owner gets a chance to draft and the draft order reverses each round of the draft
  • Auction Draft: each owner gets a set amount of money, and each owner can bid for each player as long as they have a sufficient amount of money

There are 2 types of scoring systems:

  • Standard: 1 point per 25 passing yards, 4 points per passing touchdown, 1 point per 10 rushing or receiving yards, 6 points per rushing or receiving touchdown, -2 points per fumble lost or interception.
  • Point Per Reception (PPR): Scoring is the same as Standard, except players get 1 additional point per reception

After the draft, you can add unrostered players to your team or make trades with other owners to improve your team.

The first 13 weeks of the NFL season is known as the fantasy regular season, each week you play in a head to head matchup with another owner in your league. Whoever has the most points scored by their players that week receives a win. After the first 13 weeks, the owners with the most wins make the playoffs and are placed into a bracket. After the fantasy playoffs (weeks 14–16), the champion is crowned.

A vital part of fantasy football is deciding which players you are starting based on their matchups. Some players’ output is heavily reliant on the strength of the defense they play while others are "matchup-proof", meaning that regardless of the strength of the opposing team, they will perform well. I wanted to figure out which players were "matchup-proof" and which players were matchup reliant.

This led me to ask the question: what is the effect of a defense’s strength on the fantasy output of a player?

Step #1: Data Collection

The first step to any data analysis project is collecting the data. The data necessary is the yearly stats for every player, the weekly stats for every week for every player, the rankings of every defense against QBs, RBs, WRs, and TEs, and the schedules for each team from 2017–2019.

Step #2: Calculating Fantasy Points

The next step is to iterate through all the data files and transform all of the stats into PPR fantasy points:

fantasypoints = 0
# negative stats
fantasypoints -= (stats["FL"][i] * 2)
fantasypoints -= (stats["Int"][i] * 2)
# positive stats
fantasypoints += (stats["PassingYds"][i] * 0.04
fantasypoints += (stats["PassingTD"][i] * 4)
fantasypoints += (stats["RushingYds"][i] * 0.1)
fantasypoints += (stats["RushingTD"][i] * 6)
fantasypoints += (stats["ReceivingYds"][i] * 0.1)
fantasypoints += (stats["ReceivingTD"][i] * 6)
fantasypoints += (stats["Rec"][i])

After converting player stats into fantasy points, the data files looked like:

Player Name, Position, Team, Games Played, Total Fantasy Points, Average Fantasy Points
Todd Gurley,RB,LAR,15.0,383.3,25.55
Le'Veon Bell,RB,PIT,15.0,341.6,22.77
Kareem Hunt,RB,KAN,16.0,295.2,18.45
Alvin Kamara,RB,NOR,16.0,312.4,19.52

Step #3: Pulling Defense Rankings for Each Week

Once, all the fantasy points were listed for each player in the data files, I needed to pull the defense ranking of the team they played when they scored those points.

The rankings are determined by the average number of fantasy points a defense gives to each position (QB, RB, WR, TE) throughout the year. This means that each defense has 4 different rankings:

Team, QB Rank, RB Rank, WR Rank, TE Rank
ARI,18,4,18,14
ATL,23,7,14,13
BAL,2,22,2,21
BUF,5,32,5,22

To add the ranking of the defense to every weekly stat file, I had to iterate through all the weeks and all the schedule files to find the opposing team and then add the ranking of the defense to the weekly stat file.

Once added, the weekly files looked like:

Player Name, Position, Team, Total Fantasy Points, Opposing Team Rank
Kirk Cousins,WAS,QB,26.8,17
Tom Brady,NWE,QB,33.72,30
Jared Goff,LAR,QB,23.58,31
Case Keenum,MIN,QB,28.56,19

Step #4: Creating Correlation Coefficients and Graphs

After the defensive rankings were added to the weekly files, all that was left to do was to iterate through all the weekly stat files and plot every player’s fantasy points against the ranking of the defense they played:

plt.scatter(xdata, ydata)
plt.title("Effect of Defense Strength on " + str(playername) + " in 2017")
plt.xlabel("Defense Ranking (1-32) | Correlation = " + str(correlation))
plt.ylabel("Fantasy Production Above/Below Yearly Mean")
x = np.array(xdata)
y = np.array(ydata)
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x + b)
plt.plot(flatlinex, 0*flatlinex, linestyle = "--", dashes = (5, 5), color = "black")
plt.show()

Now all that was left to do is it interpret the data that I compiled. But, before I share my findings, let me explain the significance of a correlation coefficient.

A correlation coefficient (r) quantifies the strength and direction of a linear relationship. A positive r indicates a positive linear relationship, and a negative r indicates a negative linear relationship. When r is greater than 0.6 or less than -0.6, it means that there is a strong correlation between the two variables.

Step #5: The Results

There were a few players each year that had a strong correlation between their fantasy output and defense strength:

2017:

  • Todd Gurley: 0.02 (#1 Overall Player)
  • Dak Prescott: 0.62
  • Ezekiel Elliot: 0.64
  • Alex Collins: 0.65
  • Drew Brees: 0.65
  • Charles Clay: 0.65
  • Marlon Mack: 0.67
  • OJ Howard: 0.68
  • Rex Burkhead: 0.71
  • Jared Goff: 0.85
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author

Todd Gurley was the highest fantasy scorer in 2017 and had virtually 0 correlation between his production and the defense he played. On the other hand, Jared Goff had a correlation of 0.85 and his fantasy production was incredibly reliant on his matchup.

2018:

  • Todd Gurley: 0.28 (#1 Overall Player)
  • Josh Reynolds: 0.61
  • Davante Adams: 0.62
  • Duke Johnson: 0.62
  • Marcus Mariota: 0.68
  • Corey Davis: 0.69
  • Dalvin Cook: 0.69
  • Mitchell Trubisky: 0.72
  • Carson Wentz: -0.73
  • Russel Wilson: 0.74
  • Gus Edwards: 0.83
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author

Todd Gurley was the highest fantasy scorer in 2018 and had little to no correlation between his fantasy output and the defense he played against. Surprisingly, Carson Wentz had a strong negative correlation between his fantasy output and the defense he played against. This means that he played better against better defenses. Although there is an outlier in his data, there is still a somewhat clear negative trend.

2019:

  • Christian McCaffrey: 0.42 (#1 Overall Player)
  • Tony Pollard: 0.62
  • Eric Ebron: 0.63
  • Andy Dalton: 0.64
  • Tevin Coleman: 0.65
  • Marquise Brown: 0.65
  • Chris Carson: 0.67
  • Jimmy Garoppolo: 0.68
  • Alshon Jeffery 0.68
  • Odell Beckham: 0.68
  • Devonta Freeman: 0.71
  • Melvin Gordon: 0.73
  • Adam Thielen: 0.74
  • Jared Goff: 0.79
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author
Python Generated Graph | Image by Author

Christian McCaffrey scored the most fantasy points in 2019 and had a very slight correlation between his performance and the defense he played against. On the other hand, just like in 2017, Jared Goff had a very strong correlation between his performance and the strength of the defense he played.

However, in football, many factors that go into how a player plays other than the defense they’re playing. Just to name a few: game script, injuries, coaching, etc. Although some of these numbers may seem convincing, many many other factors are playing a role.

This was my first time using pandas, numpy, and matplotlib in Python. You can check out my code on Github.

Let me know if you have any feedback. Thanks!


Related Articles