The world’s leading publication for data science, AI, and ML professionals.

Christmas Crisis!

Saving Santa with Pandas and GeoPandas

Photo by Irena Carpaccio on Unsplash
Photo by Irena Carpaccio on Unsplash

Finally! I know! A Data Science Project as Festive as the Season!

Data Science has many applications; so many in fact, it is nigh impossible to even be considered proficient in half of them! But, that shouldn’t dissuade us from trying to become as equipped for the future as possible. I challenge you to take on this project and have a once in a lifetime opportunity to serve as Santa’s "pocket" Data Scientist!

_As a precursor to this project, note that this project is designed to be a challenge project. You are given the tools and supplies, but the method is left up to you! While the ‘answer key’ is provided on Github, you should have everything you need in the data to figure it out. This project also serves to introduce you to some tools you may not be familiar with, yet._


What you will need (or better yet, what you should be prepared to teach yourself 😉 ):

  • A good understanding of Data Science oriented Python libraries
  • A basic understanding of GeoPandas
  • A love of Christmas!

What you will learn:

  • How to merge non-standardized data with a Universal Standard Identifier (Placekey)
  • How to find points within polygons using Geopandas
  • How to filter the data to determine the best possible outcome (min/max)
  • How to merge data and determine foot traffic to given POI (SafeGraph)
  • How to ** save Santa and Christma**s
Photo by https://knowyourmeme.com/photos/1229977-its-always-sunny-in-philadelphia
Photo by https://knowyourmeme.com/photos/1229977-its-always-sunny-in-philadelphia

The Plot:

Congratulations! Welcome to the North Pole Ranks! You have just been promoted to Senior Data Scientist at CANDi (Central Association of Northern Delivery Inc). Santa has just exchanged Wikr Me usernames with you to make sure he can maintain constant encrypted contact with his number one Data Scientist – on the off chance something goes awry foreshadowing intensifies you will be prepared.

Gif by Giphy.com
Gif by Giphy.com

As 2am approaches, you settle into your desk with a hot cup of CoCo (I highly recommend you get into character by getting a cup IRL), when all of a sudden you get a Code Red Alert! Santa’s sleigh has gone down somewhere in Manhattan! Luckily, your years of training at the CCA (Candy Cane Academy) have prepared you for this exact scenario. Santa needs to get to a WiFi hotspot ASAP to upload his delivery itinerary to the North Pole so the B team can pick up where he left off and finish the deliveries! Luckily, due to Santa’s affinity for the Grande, Iced, Sugar-Free, Toasted White Chocolate Mocha at Starbucks, he requires all of his employees to keep a list of Starbucks locations on hand at all times.


Christmas Crisis Github

Steps:

  1. Find Santa’s coordinates in the ReadMe and save them
  2. Get the list of Starbucks Addresses from the Github repo and make sure the data is clean
  3. Get your data upload/download speed GIS file (Santa needs a hub with an upload speed over 80mbps – provided by Ookla)
  4. Merge the Starbucks Address list to the SafeGraph location data using Placekey (in the Repo)
  5. Cross reference the merged file with Ookla’s DL/UL file to find out which Starbucks are an option
  6. Send Santa the address
  7. (BONUS): Cross reference the Ookla/SafeGraph POI Dataframe with your foot-traffic data to find the location with the fastest upload speed and lowest foot-traffic to keep Santa COVID-safe.

Tips:

  • The Ookla data is given in kbps, we need **over 80mbps***
  • While you can write the Placekey code out in Python, you can also use a Placekey add-on in Google sheets which saves a lot of time for smaller datasets like these
  • GIS data is computationally heavy – to save yourself some headaches, doing the work in a Jupyter type notebook can keep you from having to filter the shapefile over and over again. . . wasting time.

— – – – – – [STOP HERE IF YOU WANT A CHALLENGE] – – – – – – –

Photo by Juli Kosolapova on Unsplash
Photo by Juli Kosolapova on Unsplash

— – – – – – –[HOW TO COMPLETE THE CHALLENGE] – – – – – –


First and foremost, I must make it abundantly clear that there are a million ways to open a present, just like there are a million ways to solve this project. If you come up with a better / more efficient way to solve these steps (especially the GeoPandas section) please leave a comment below to help out your fellow scientists!

With that out of the way, let’s dive in!

You can view the entire GITHUB REPO with code

Step 1: This step is pretty straight forward, access the ReadMe file and you will find Santa’s coordinates

Photo by Author
Photo by Author

Step 2: You will find the list of Starbucks addresses in the file labeled ‘Santas Starbucks List’. In order to check the data, simply read it into Pandas and check for Nan or Null values. If any rows have Nan or Null, you can decide to drop them or keep them after looking over them.

Step 3: For step 3, there are quite a few ways to go about it. You can just read the data from the zipped file, you can extract the data and read in the shapefile, or you can use the parquet file and convert it as needed. I chose to use the parquet because it read in and processed much quicker than just the shapefile.

Photo by Author
Photo by Author

Step 4: You will need to get an API key from Placekey.io (don’t worry it is free and possibly one of the most useful tools you will invest in learning). Once you have your key, you can plug your address in to assign a Placekey to each address that will allow you to merge to other datasets without having to worry about correcting for things like Street, St, st, street, str, etc. You will want to get a Placekey for both your Starbucks data and SafeGraph POI data to then merge on Placekey. This will allow you to get the Lat, Lng of each Starbucks.

Photo by Author
Photo by Author

Step 5: This step will probably be the most challenging for most users. You will need to convert your newly created POI merge file to a shapefile in order to check for Lat, Lng within the polygons of the Ookla dataset. You can find the full code HERE. Once you have the master Dataframe with Starbucks addresses and their respective Upload and Download speeds, you can choose the one with the fastest Upload speed for Santa to go to

Photo by Author
Photo by Author

Step 6: Comment below with your optimal Starbucks location for Santa to go to

Photo by Author
Photo by Author

Step 7 (BONUS): For this one, you will need to go into the bonus folder and find the patterns file that will give you all of NYC’s foot-traffic data for the past week – courtesy of SafeGraph. Merge the data to your master DF on SafeGraph Place ID to find the lowest foot-traffic that still meets your requirement for over 80mbps download speed.

Photo by Author
Photo by Author

That’s it! You did it! Congratulations, you saved Christmas and possibly learned something while doing it!

I hope this has been both a festive and educational exercise and wish you all a Merry Christmas! If you have any questions about the project feel free to connect via Twitter or Linkedin.

A special thanks to the teams behind the products used in this project

Placekey:

"Placekey is a free, universal standard identifier for any physical place, so that the data pertaining to those places can be shared across organizations easily. … It’s a movement of organizations and individuals that prize access to data. Solve problems related to address and POI matching and entity resolution with ease with a free Placekey."

SafeGraph:

"SafeGraph, a data company that aggregates anonymized location data from numerous applications in order to provide insights about physical places. To enhance privacy, SafeGraph excludes census block group information if fewer than five devices visited an establishment in a month from a given census block group."

If you would like access to Foot Traffic Data or Social Distancing Metrics etc, contact SafeGraph.

Ookla:

"Speedtest.net, also known as Speedtest by Ookla, is a web service that provides free analysis of Internet access performance metrics, such as connection data rate and latency." – LINK TO DATA


Related Articles