The Garoppolo Effect: Exploring NFL Data using Python Tutorial

Kishan Panchal
Towards Data Science
8 min readMar 26, 2018

--

Photo by Ashton Clark on Unsplash

Jimmy Garoppolo just signed the largest contract in NFL history with the San Francisco 49ers. He started 5 games for them after being traded midseason and led the 49ers to wins in all of those games.

I wanted to explore how the 49ers team changed after Garoppolo was traded and understand a bit more about how he helped the team. For this analysis, I used a dataset about 2017 NFL plays. You can get the data using NFL scrapeR. I used this tool to download NFL play by play data for 2017 and have included a direct link to where the 2017 NFL data can be downloaded.

For our purposes, we are just going to focus on analyzing the 49ers data for the 2017 season, and this is a walkthrough about how to do that using Python 2.7. I have included some comments within the code to help you follow along. Since Jimmy Garoppolo only started 5 games for the 49ers, these plots will not all have a similar number of observations, but let’s see what interesting things we can learn.

import pandas as pd # data manipulation library
import numpy as np # numerical computation library
import datetime as dt

import matplotlib.pyplot as plt # plotting library
from matplotlib import cm # color maps for plotting
plt.style.use('ggplot') # use the ggplot plotting style

%matplotlib inline # show plots in line in a jupyter notebook

from __future__ import division # division without truncating decimals

Now, let’s read in our data into a variable called nfl.

nfl = pd.read_csv('https://raw.githubusercontent.com/ryurko/nflscrapR-data/master/data/season_play_by_play/pbp_2017.csv', low_memory=False)

Because we want to compare how the team performed before and after the trade, we can add an additional column that indicates that a date was before or after the date when Jimmy Garoppolo started a game for the 49ers. Doing this will allow us to compare the data before and after Jimmy was a starter since we can aggregate data based on the value of this column.

First, we convert the date column to datetime format so that we can compare them with a date that we want to check against.

nfl['Date'] = pd.to_datetime(nfl['Date'])

Then we can create a column called Jimmy that is yes if the game date is on or after December 3rd, 2017 and is no otherwise.

nfl['Jimmy'] = np.where( nfl['Date']>=pd.datetime(2017,12,3), 'yes', 'no')

Now if we check our nfl dataframe, we will see that the last column is now Jimmy.

We can obtain 49ers specific data from the 2017 NFL data by subsetting the NFL data such that the home or away team is the SF 49ers.

niners = nfl[ (nfl["HomeTeam"] == 'SF') | (nfl["AwayTeam"] == 'SF') ]

Next, we can look into touchdowns scored. To check touchdown information, we can check that the home or away team is SF, a scoring play occurred, a touchdown occurred, the team on offense was SF, and there was no interceptor, who is a player who intercepted the ball.

niners_td = nfl[((nfl["HomeTeam"] == 'SF') | (nfl["AwayTeam"] == 'SF')) & (nfl["sp"] == 1) & (nfl["Touchdown"] == 1) & (nfl["DefensiveTeam"] != 'SF') & pd.isnull(nfl["Interceptor"]) ]

And we get 31 touchdowns for the season.

Now we can check how many touchdowns were scored without Jimmy and with him by grouping our dataframe.

niners_td.groupby('Jimmy').Touchdown.sum()

Touchdowns with and without Jimmy Garoppolo

Now let’s graph the number of touchdowns scored while Jimmy started and did not start games.

tds = niners_td.groupby('Jimmy').Touchdown.sum() # score the touchdown information in tds

fig, ax = plt.subplots(figsize=(8, 6), dpi = 72) # Get access to the figure and axes to modify their attributes later

ax.set_title("Total Number of Touchdowns", fontsize = 18) # Chart title
ax.set_xlabel('Jimmy', fontsize = 15) # X-axis label
ax.set_ylabel('Number of Touchdowns', fontsize = 15) # Y-axis label
plt.xticks(fontsize = 13)
plt.yticks(fontsize = 13)

mycolors = ['#A6192E', '#85714D'] # Using scarlet and gold colors

tds.plot(kind='bar', alpha = 0.9, rot=0, color = mycolors) # Plot a Bar chart
plt.show()

While this plot is nice, we should also check the number of touchdowns per game since Jimmy only played in 5 games.

We can see that the 49ers scored approximately 1 more touchdown per game when Jimmy Garoppolo started. This does not mean that he was responsible for every touchdown scored when he played, but this just shows the number of touchdowns that were scored when he played.

Touchdowns and Interceptions over Time

To get a different point of view about the touchdown situation, we can take a time-series approach where we take a look at the number of touchdowns and interceptions over time. We can mark off the point in time at which Garoppolo started games and see what changes we can observe.

# get sum of touchdowns by game day
td_by_date = niners.groupby('Date')['Touchdown'].sum()
td_by_date;
# get sum of interceptions by game day
inter_by_date = niners.groupby('Date')['InterceptionThrown'].sum()
inter_by_date;

Now let’s graph it.

fig, ax = plt.subplots(figsize=(8, 6), dpi = 80) # set plot size 

mycolors = ['#A6192E', '#85714D'] # Using scarlet and gold colors

f1 = td_by_date.plot(color = mycolors[0]) # plot the touchdowns
f2 = inter_by_date.plot(color = mycolors[1]) # plot the interceptions

ax.set_title("Touchdowns and Interceptions over Time", fontsize = 18) # Chart title
ax.set_xlabel('Game Date', fontsize = 15) # X-axis label
ax.set_ylabel('Count', fontsize = 15) # Y-axis label
plt.xticks(fontsize = 12)
plt.yticks(fontsize = 12)

plt.axvline(dt.datetime(2017, 12, 3), color = 'black') # add a vertical line
plt.legend(loc='upper center', frameon=True, facecolor="white") # add a legend with a white background

plt.show()

The lines to the right of the black vertical line are games that Jimmy started. We notice that before he started for the 49ers, they were on a downward trend in terms of the number of touchdowns scored, and after he started, their offense started taking off again.

Comparison of Different Play Types

We can also compare the different types of plays made when Garoppolo was not and was starting. This can give us an overall sense of how the prevalence of certain plays changed when he started vs when he did not since the types of plays ran can change with a different quarterback. To compare different playtypes, we use the niners dataframe and not the niners_td dataframe defined above because we are concerned with overall plays in the game and not just the plays that occur when the 49ers are on offense.

fig, ax = plt.subplots(2, 1, figsize=(10, 8), dpi = 85) # specify a plot with 2 rows and 1 column

# get plays where Jimmy did not start and did start
f1 = niners[niners['Jimmy']=='no']['PlayType'].value_counts().plot(kind='barh', ax=ax[0])
f2 = niners[niners['Jimmy']=='yes']['PlayType'].value_counts().plot(kind='barh', ax=ax[1])

f1.set(title = "Before Jimmy's Starts", xlabel='Count', ylabel='Play Type')
f2.set(title = "After Jimmy's Starts", xlabel='Count', ylabel='Play Type')
f1.set_xlim(0,805) # use the same scale for both plots
f2.set_xlim(0,805)
fig.tight_layout() # prevent overlapping axis labels

plt.show()

There are a different number of games and therefore count of plays, of course, but if we use the colors in both plots to match up which play was most frequent, we see that in both graphs, the most common plays were Pass, Run, and Kickoff, but then we notice that punts were more common before Jimmy started whereas field goals were more common after Jimmy started. This could indicate that Jimmy’s play could have helped the 49ers get closer to scoring position and the chance to score more field goals.

Lastly, let’s dive deeper into the data about the 49ers offense.

Let’s take a look at the top plays in terms of yards gained when the 49ers were on offense. As before, we will subset our data to obtain data when the 49ers are on offense.

niners_offense = nfl[((nfl["HomeTeam"] == 'SF') | (nfl["AwayTeam"] == 'SF')) & (nfl["DefensiveTeam"] != 'SF') ]

We can create a new dataframe called most_yards that takes the 50 observations where the 49ers offense gained the most amount of yards.

most_yards = niners_offense.sort_values(by='Yards.Gained', ascending=False)[:50]

We can see that 20 of these 50 top plays by yards gained occurred when Jimmy started, but since the number of the occurrences differ, we will look at overall values instead of separating the plots as before since these different observations can lead to a varying number of bars in bar plots for instance.

We notice that Marquise Goodwin was the receiver who was part of the plays resulting in the most yards gained last season.

passes = most_yards[most_yards["PlayType"] == 'Pass']fig, ax = plt.subplots(figsize=(8, 6), dpi = 75)

f1 = passes['Receiver'].value_counts().plot(kind='barh')
f1.set(title = "Players with the most Yards after receiving Passes", xlabel='Count', ylabel='Player Name')

plt.show()

We can see that Matt Breida and Carlos Hyde were part of the most successful runs by yards gained last season.

runs = most_yards[most_yards['PlayType'] == 'Run']fig, ax = plt.subplots(figsize=(6, 5), dpi = 75)

f1 = runs['Rusher'].value_counts().plot(kind='barh')
f1.set(title = "Players with the most Yards from Rushing", xlabel='Count', ylabel='Player Name')

plt.show()

Conclusion

I hope that you enjoyed this guide walking through some data analysis in Python using NFL data. Now you can go download the NFL data, play around with different information, and see what interesting things you find!

Thank you for taking the time to read this post, and feel free to leave a comment or connect on LinkedIn.

References:

  1. Pandas Data Frame Documentation
  2. Pandas Plot Documentation
  3. Matplotlib Documentation
  4. NFL ScrapeR
  5. Personal Github Repository with this code notebook

--

--