Using Python to Scrape NFL Stats and Compare Quarterback Efficiencies
Using BeautifulSoup to scrape NFL stats and plot radar charts of quarterbacks
I have always been apprehensive about trying to scrape my own data, and the fact that websites like Kaggle aggregate such high quality datasets has made learning this skill less of a need. However, the abundance of educational articles on data science on this platform have helped me make progress towards collecting my own datasets. A lot of the inspiration and methods for scraping data came from here.
In this article, I will pull quarterback stats from the 2019–20 NFL season from Pro Football Reference, and use them to create radar charts to assess QB efficiency.
Load Packages
To open the webpage and scrape the data, we will use two modules, urllib.request
to open the URL, and BeautifulSoup
to parse through the HTML.
# Import scraping modules
from urllib.request import urlopen
from bs4 import BeautifulSoup
In addition to these packages, we will need some packages to manipulate data, numpy
and pandas
, and plot our data, matplotlib
.
# Import data manipulation modules
import pandas as pd
import numpy as np