Youth Voter Fatigue — Fact or Fiction

Published in

Towards Data Science

9 min readNov 26, 2018

I’m sure you’ve heard this before: newly registered, energetic, young voters turn out for their first election and then proceed to disengage from the process when the elections they participate in don’t happen to go the way they hoped.

Turnout trend by age, for those who were registered to vote in North Carolina and voted in the 2016 general election.

From looking at the turnout percentage among registered voters, this theory seems to hold true. In the 2016 general election, 67.2% of 18-year-olds who were registered to vote in North Carolina, turned out. But turnout was not consistently this high among young people. Among 23-year-olds registered to vote, only 47.2% voted in 2016. Naturally, this leads to an important question.

Do 23-year-olds objectively turn out to vote less often than 18-year-olds?

Does turnout really drop after age 18?

The result of this exploration has interesting implications, considering that if the dip in turnout is genuinely reflecting youth behavior, there exists a huge potential for an overall increase in turnout. If we can keep these young voters engaged throughout their lives, instead of dropping off in their 20s, our country’s electorate could look considerably different. And most people would agree that more political participation from young people can only benefit our democracy.

So, to test whether or not this deflation in turnout is legitimate, I calculated turnout percentages as a proportion of the citizen voting age population, comparing these turnout values to the ones among those already registered to vote.

Turnout trends in North Carolina for the 2016 general election by age among registered voters and citizens.

After calculating the turnout of registered voters and turnout of citizens, the finding is remarkable. Although citizen turnout appears to dip slightly, from 45.8% at age 18 to 41.3% at age 24, the decrease is not nearly as pronounced as the comparatively sharp drop among those registered. The general trend among citizens is turnout increasing with age, while turnout with regard to registered voters gives us a different idea entirely.

Considering the possibility that 2016 may have been an anomaly in the data (considering how many quirks we experienced that election cycle), let’s take a look at other election years.

By looking at the results for the years 2014, 2012, and 2010, we can conclude that this phenomenon is not confined to any particular election cycle, nor any type of election. A decrease in turnout after age 18 is ubiquitous among registered voters; while citizen turnout generally continues to increase with age.

While some elections do show a slight dip in turnout, such as 2012 and 2016 (interestingly both presidential elections), the percentage decrease for registered voters is always more than any percentage decrease for citizens, as shown in the table below.

I suspect that this phenomenon can be attributed to “deadwood” in the voter registration records, remnants of people who have changed addresses but still exist in the voter registration rolls, as a result of the National Voter Registration Act requiring election officials to remove voters only after failing to vote in a number of federal elections. The decline in turnout we are seeing could be a symptom of an inflated numerator, due to an inflated number of registrations on the books for young people, in the registered voter turnout calculation. Young people having an inflated number of registrations on the records would make sense, given the propensity for young people to move around frequently.

As seen above, the years of presidential elections seem to have greater drops in registered voter turnout. The first column shows the percentage decrease in the registered voter turnout from age 18 to the age of the lowest turnout for that year. The second shows that percentage change of citizen turnout in that same age range. The third column is the addition of the first two columns and can be interpreted as how much of the drop in the registered voter turnout is a result of “deadwood.” Again, the sharpest drop in turnout among registered voters occurred during the two presidential elections.

If the decline in turnout is caused by changes of addresses not being updated in the voter registration rolls, we might suspect that “deadwood” levels vary by race. Does race affect magnitude of turnout drop among registered voters?

The graphs above depict the registered voter and citizen turnout in the 2016 general election for voters of different races and ethnicities. At first glance, it seems like every registered turnout has a similar decrease in turnout. Citizen turnout, on the other hand, varies wildly across races and ethnicities.

From the table on the left, we can see that each race and ethnicity group had citizen turnout decrease. But, in each case, the decline in registered turnout percent is larger than the decline in citizen turnout, continuing to suggest that the registration turnout is displaying a false trend as a result of an inflated number of registrations. The third column again displays the percentage error between the registered turnout and citizen turnout. This analysis shows that Native American voters are most heavily affected by “deadwood” in the registration records, followed by Black voters.

Further study should be done to include other states and expand the number of election cycles observed. It may also be interesting to look through the voter registration records to identify who these inflated registrations belong to and why they are in the system in the first place. We must walk a fine line, however, as to not encourage further efforts to purge voters from the rolls.

In the meantime, we need to change the narrative around young people’s engagement in the election process. Our results suggest that young people are not jaded from losing elections, they have yet to be mobilized in the first place. Let’s stop dissuading young people from turning out by spreading false accounts of their voting patterns.

Methodology:

The Data

The data used comes from North Carolina’s FTP site which can be found here. I used one file containing the statewide voter history and four different voter registration files for the different elections. Luckily, North Carolina uploads “snapshots” of the voter registration file before every major election (except for 2010), so this was the voter registration file I always used when looking at a particular election. To calculate citizen estimates, I used a combination of three American Community Survey data files for citizens estimates, yearly population estimates, and population counts by age.

Algorithm

For every election, my code was essentially broken up into two major parts: finding registered voter turnout and citizen voter turnout. Both begin by filtering the vote history file for the appropriate election.

Then, I merge the vote history file and the voter registration files together, to create a data frame which contains only people who voted in the election I am interested in and also adds useful information from the voter registration file such as age. Next, to calculate registered voter turnout, I count the number of people in the voter registration file, the number of people in the merged file, and save the results to a new data frame. The registered voter turnout is the division of these two values.

To calculate citizen turnout, I begin by finding an estimate for the number of people of each age. The data available has yearly estimates released in July. In order to better estimate population growth between a July estimate and a November election, I add the difference between the election year estimate and the upcoming election and multiply by 0.3 to the election year estimate.

Then, I need to deflate this population estimate by the percentage of the population who are citizens. I have data on the number of citizens per age group, which I use to find the percentage in each group with citizenship. Taking these group estimates, I deflate the population estimates.

For the citizen turnout, the numerator is the same as from the registration turnout, so I simply use the same data here and calculate new turnout numbers.

To split up these trends by race, I simply filtered the voter registration file once more, to indicate which race or ethnic code I am analyzing.

Citizen turnout by race must also be edited. I use the same file from before to get population estimates by age, this time specifying in my count which race or ethnicity I want the data for. Just as before, I use the age range citizen percentages found earlier to deflate the population estimate by the citizen estimate. In the future I would prefer to use citizen percentages from each race or ethnic category to deflate the population estimates instead of the citizen percentages from the total population. Along the same lines, if I could find a dataset that contains citizen percentages by age instead of age group, that would make my citizen turnout calculations more accurate.

All my graphs were creating using ggplot. I used a custom theme, which you can see the details of down below.

Challenges

One of the biggest challenges of this project was, of course, analyzing large data files. I had to import four different voter registration files, each between four and sixteen gigabytes, and another five gigabyte file containing the state’s vote history. This meant my computer ran my code very, very slowly. If I was lucky the entire script would take 1.5 hours to run. More often than I care to admit, however, my coding session ended thanks to my computer crashing completely. Thankfully, this issue required nothing other than patience to remedy.

A challenge which occupied more of my time happened to be the details of the voter registration file. North Carolina includes a column in their registration file which indicates a voter’s status, which can be marked as either “Active,” “Inactive,” or “Removed.” When I began this project, I included the “removed” voters in my calculations, which I quickly realized made my data lose meaning.

2016 North Carolina registered voter turnout with “removed” voters included

This graph looks completely different from the registered voter turnout I have been using throughout this paper because it includes people who have been “removed” from the voter registration roll in the numerator and the denominator, making both larger than they should be. Then, I tried to calculate registered voter turnout by editing the denominator to only include voters marked as “inactive” or “active” in the registration file.

This change helped my curve look closer to expected, except that the citizen turnout was above the registered voter turnout, which defies logic. The problem was due to a result of me not filtering out “removed” voters from the vote history count as well as the registration count. Many of the people marked as “removed” have multiple lines in the voter registration file due to changes of addresses, with only one of those being marked “active.” If I neglect to filter out removed voters, and one of those voters with multiple removed histories happens to vote, their other histories become counted as votes for 2016. Obviously, this inflated the number of people in the numerator, driving the percentage up above plausible levels.

Youth Voter Fatigue — Fact or Fiction

Methodology:

The Data

Algorithm

Challenges

Written by Anna Baringer