Direct to Pandas DataFrame
3-minute guide to using Chrome DevTools and Python to download and read data directly from a remote URL to a Pandas DataFrame
The Challenge
For new data scientists and software developers downloading data from a remote URL is often one of the first tasks to be automated. For example, in the image below I want to download and parse the Hourly Precipitation CSV Sample data from Data.gov directly to a Pandas DataFrame.
The problem is it’s not clear how to find the URL path that the download button is redirecting towards.
Find the Remote URL Path
Using Chrome DevTools (or similar) and following the below steps it’s easy to find the remote URL path.
- Navigate to the page where you want to download the data.
- Open Chrome DevTools with right-click → inspect or navigate to Chrome menu → More tools → Developer tools.
- Navigate to Console in the Chrome DevTools. You may see text in the console, it can be cleared with right-click → Clear console.
- With the Console open click on the Download button or similar.
- The URL path that the Download button redirects to is now displayed in the Console.
Download Data Directly to Pandas DataFrame
Once you have found the remote URL path it’s simple to read the data into a Pandas DataFrame. The below code demonstrates how to parse a CSV file, but it would be easy to do this for JSON, Excel, and other file types.
Batch Download Directly to Pandas DataFrame
Typically you wouldn’t automate downloading a single file but instead would download a batch of files from a remote URL. For example, the below image shows the download portal for hourly weather data at Vancouver International Airport. The issue is the downloads are for 1 month periods, so if I want to download 1 year of data I would need to do 12 downloads.
Let’s inspect the remote URL that the Download Data button redirects towards using the Chrome DevTools Console.
Notice, in addition to other unique identifiers, there is the download year, month, and day. So let’s write a short script to download 1 year of data and combined it into a single Pandas DataFrame.
Conclusion
Pandas is a must for any data scientist and Chrome DevTools is a great addition to the toolbox. Following the above steps, it was easy to download data directly to a Pandas DataFrame and little effort would be needed to extend this to other file types.
Happy coding!