_This article offers a high-level overview of estimating political affiliation at the neighborhood level. For a more in-depth discussion, check out the accompanying Google Colab notebook or recent Meetup presentation._
Disclaimer: I am a data science contractor with SafeGraph, the company whose data is used in this project.
Introduction
Political affiliation is a powerful variable that’s been linked to things like religious extremity, vaccine refusal, stance on corporal punishment, and much more. Researchers use it to understand various phenomena. Businesses use it to understand their market. Political campaigns use it to win elections.
Unfortunately, political affiliation data is typically only available at the county or district level. While other demographic variables like income, race, population, and education are available at the Census Block Group (CBG) level through the US Census Bureau, political affiliation is not. Note: On average, each county is composed of over 67 CBGs.
As a result, political affiliation variables may overlook important variation within a county or district. For example, check out the graphic below. When estimated at the county level, Cobb County (Georgia) is slightly Democrat-leaning (blue). When estimated at the CBG-level, it is clear that many parts of Cobb County are Republican-leaning (red).

With the right data and tools, we can easily make these estimations.
Materials
foot traffic patterns + polling locations/results = political affiliation
We use three datasets and one API to make these estimates. For this tutorial, code and sample data have been made available on my GitHub.
Data
- Polling locations (Cobb and Forsyth County, Georgia): addresses of polling locations for the 2020 US General Election
- Election results (Georgia): results by polling location for the 2020 US General Election
- SafeGraph Patterns: foot-traffic data indicating the home CBG of visitors to each point of interest (POI) for a given time period

When we join SafeGraph Patterns with election results, we can associate election results with CBGs.
One problem: joining POI data is hard because of inconsistent address formatting. Fortunately, we can use Placekey to solve this problem.

Joining Data
- Placekey: free tool for joining POI data at scale, extremely useful because most POI datasets do not have standardized address columns ("AVE" vs "Ave." vs "Avenue", "NW" vs "N.W." vs "Northwest" vs "Nw", etc)

Placekey is an open, unique, standard identifier for every POI in the US (and Canada, and more countries soon). When given location parameters (address data or coordinates/location name), the Placekey API returns a Placekey.
We want to align the Patterns visitor_home_cbgs
column with Election Results by Precinct. Then we can estimate Political Affiliation for each CBG.
Check out the image below with the following in mind:
- Red: Precinct ID columns used to merge Election Results with Precinct Locations.
- Yellow: Placekey columns used to merge Precinct Locations with Safegraph Patterns.
- Blue: Election results and visitor home Census Block Group columns. The goal of all this merging is to match
visitor_home_cbgs
with the election results.

Precinct
column (red). "Precinct Locations" will be joined with "SafeGraph Weekly Patterns" on the placekey
column (yellow). Our goal is to match the Patterns visitor_home_cbgs
column with the election results (blue) using the corresponding locations. Note: the placekey
column must be added to the "Precinct Locations" dataset by use of the Placekey API.For example, given the following table’s precinct vote totals, suppose CBG 123456789012
was the home to 50% of the visitors to Precinct A, 20% of the visitors to Precinct B, and 10% of the visitors to Precinct C.

Then we estimate CBG 123456789012
to have vote totals of:
Trump = .50*100 + .20*25 + .10*60 = 50 + 5 + 6 = 61
Biden = .50*50 + .20*25 + .10*150 = 25 + 5 + 15 = 45
Steps
Reminder: Detailed discussion and code are available on GitHub. It’s super easy to recreate with the accompanying Google Colab notebook!
1) Read in data
Fairly straightforward – read the CSV data into a Pandas DataFrame. We saw examples of the three datasets (election results, polling locations, and SafeGraph Patterns) above.
2) Join voting data with Patterns data
a. Join election results with polling locations. As shown in the image above, join the election results data with the polling locations data on the Precinct
column.
b. Add Placekey column to voting data. Using the Placekey API, request Placekeys for each address in the voting data from step (2a). This step is made easy with the Python library for working with Placekeys.
Here’s an example query:
place = {
"query_id": "0",
"location_name": "Twin Peaks Petroleum",
"street_address": "598 Portola Dr",
"city": "San Francisco",
"region": "CA",
"postal_code": "94131",
"iso_country_code": "US"
}
pk_api.lookup_placekey(**place, strict_address_match=True)
Output:{'query_id': '0', 'placekey': '227-222@5vg-82n-pgk'}
Upon successful query return, we add the Placekey to the corresponding row in the voting data.
c. Join Patterns and voting data on Placekey. SafeGraph Patterns comes with Placekey already built in, so we can simply join the Patterns DataFrame with the voting DataFrame on placekey
. This should be an "inner" join because only a small fraction of the Patterns rows correspond to polling locations. Additionally, a few of the polling locations may not be covered by SafeGraph patterns.
A sample of the result:

3) Explode by visitor home CBG
You’ll notice the visitor_home_cbgs
column is in JSON format, where the key is a CBG and the value is the number of visitors from that CBG at the row’s Precinct. We need to vertically explode the visitor_home_cbgs
column to extract this data. Fortunately, SafeGraph has a Python library to make this process painless.
Below is the resulting DataFrame filtered to just Precinct 3. Notice that the in_person_trump
, in_person_biden
, and in_person_jorgensen
columns are all the same in each row, because each row corresponds to Prccinct 3. The cbg
column includes each key in Precinct 3’s visitor_home_cbgs
JSON object. The visitors
column is the key’s corresponding value from the JSON object.

Let’s look at this same DataFrame again, but this time filtered to all rows where cbg
is ‘131171301011’.

6) Calculate estimated CBG political affiliation
Next, we calculate the portion of visitors from each CBG that were seen at each precinct. For example, the total number of visitors seen from ‘131171301011’ was 36 (the sum of the visitors
column). Then the portion
is visitors / total_visitors
.

Then we have the following for CBG ‘131171301011’:
in_person_trump = sum( in_person_trump * portion ) = 321*.11111 + 779*.305556 + 513*.277778 + 409*.138889 + 294*.166667 = 522
in_person_biden = sum( in_person_biden * portion ) = 283*.11111 + 174*.305556 + 94*.277778 + 120*.138889 + 117*.166667 = 146.9
in_person_jorgensen = sum( in_person_jorgensen * portion ) = 20*.11111 + 37*.305556 + 23*.277778 + 18*.138889 + 13*.166667 = 24.6
We’re interested in the portion affiliated with each candidate (not the actual number of votes – if we wanted to calculate the number of votes we’d have to normalize SafeGraph’s visitor counts, because the counts are based on a sample). To calculate affiliation for CBG ‘131171301011’ we have:
trump = in_person_trump / (in_person_trump + in_person_biden + in_person_jorgenson) = **.753
**biden = in_person_biden / (in_person_trump + in_person_biden + in_person_jorgenson) = **.212
**jorgensen = in_person_jorgensen / (in_person_trump + in_person_biden + in_person_jorgenson) = **.035
**
Here are a few of the other CBGs from this dataset:

Note: The actual estimates in the accompanying Google Colab notebook may vary slightly due to an added normalization step.
7) Map estimates
As a final step, we can visualize our estimates by mapping the affiliations. Blue indicates Biden-leaning, red indicates Trump-leaning, and white indicates a 50/50 split. It is clear that some CBGs are missing, which is briefly addressed below. The upper right grouping of CBGs correspond to Forsyth County, while the lower left grouping corresponds to Cobb County. Forsyth County is fairly homogenous, but Cobb County has some variation. In general, Cobb County seems to get more blue as it gets closer to Atlanta, but there are some exceptions.

Limitations
- Foot traffic data corresponds to the entire week of Election Day. One could attempt to correct for this by removing "expected traffic" based on previous or subsequent weeks.
- We only estimate using in-person votes, which is problematic given (a) many votes were mailed in because the election took place during a pandemic, and (b) mail-in votes tended to be Democrat-leaning. One could attempt to correct for this at the county level or – with an understanding of the county’s election rules and voter registration data – at the CBG level.
- Some CBG estimates are missing entirely. Depending on the needs of the analysis, missing estimates could be ignored entirely, filled in with median/mean imputation, estimated nearest-neighbors imputation, or estimated with nearest-polling locations imputation.
Conclusion
While this method has some limitations, the resulting estimates provide much more value to researchers, businesses, and political campaigns. With CBG-level estimates, political affiliation can be analyzed like never before in relation to US Census variables, such as income, race, education, age, etc. Placekey enables these CBG-level political affiliation estimates at scale, which in turn enables powerful insight.
A future direction of this work could analyze things like voter turnout, voter suppression, and much, much more. SafeGraph Patterns includes values for dwell time and distance traveled. Before analyzing these things, you’ll want to normalize the raw Patterns data because SafeGraph’s data is a sample of the entire US population, and the underlying panel changes over time. The scaling step was outside the scope of this tutorial, but it’s covered on GitHub.
The data and notebook are set up for you to use! What insights can you uncover?!