Direct to Pandas DataFrame

3-minute guide to using Chrome DevTools and Python to download and read data directly from a remote URL to a Pandas DataFrame

David Hurley
Towards Data Science

--

Photo by Patrick Fore on Unsplash

The Challenge

For new data scientists and software developers downloading data from a remote URL is often one of the first tasks to be automated. For example, in the image below I want to download and parse the Hourly Precipitation CSV Sample data from Data.gov directly to a Pandas DataFrame.

The problem is it’s not clear how to find the URL path that the download button is redirecting towards.

Find the Remote URL Path

Using Chrome DevTools (or similar) and following the below steps it’s easy to find the remote URL path.

  1. Navigate to the page where you want to download the data.
  2. Open Chrome DevTools with right-click → inspect or navigate to Chrome menu → More tools → Developer tools.
  3. Navigate to Console in the Chrome DevTools. You may see text in the console, it can be cleared with right-click → Clear console.
  4. With the Console open click on the Download button or similar.
  5. The URL path that the Download button redirects to is now displayed in the Console.
Finding URL path with Chrome DevTools Console

Download Data Directly to Pandas DataFrame

Once you have found the remote URL path it’s simple to read the data into a Pandas DataFrame. The below code demonstrates how to parse a CSV file, but it would be easy to do this for JSON, Excel, and other file types.

Read data from remote URL directly to Pandas DataFrame
Output of above code

Batch Download Directly to Pandas DataFrame

Typically you wouldn’t automate downloading a single file but instead would download a batch of files from a remote URL. For example, the below image shows the download portal for hourly weather data at Vancouver International Airport. The issue is the downloads are for 1 month periods, so if I want to download 1 year of data I would need to do 12 downloads.

Let’s inspect the remote URL that the Download Data button redirects towards using the Chrome DevTools Console.

“https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&stationID=51442&Year=2020&Month=6&Day=7&timeframe=1&submit=Download+Data"

Notice, in addition to other unique identifiers, there is the download year, month, and day. So let’s write a short script to download 1 year of data and combined it into a single Pandas DataFrame.

Read data from remote URL for multiple files and combine
Output of above code

Conclusion

Pandas is a must for any data scientist and Chrome DevTools is a great addition to the toolbox. Following the above steps, it was easy to download data directly to a Pandas DataFrame and little effort would be needed to extend this to other file types.

Happy coding!

--

--