The world’s leading publication for data science, AI, and ML professionals.

What in the World Happened in Virginia Tuesday Night?!

Statistically speaking, Your Vote Matters

Statistics breaks when your sample is biased

Source: Google Macro on 11/3/2020 screenshot by author. [1]
Source: Google Macro on 11/3/2020 screenshot by author. [1]

Like most American families, Tuesday night my mom and I closed up work around 5pm, heated up some leftovers, and huddled around a TV to commence what would turn into a long night of bingeing/"playing along at home" with the election results. Most of our analytics fix was generously provided for by CNN (oop, just called ourselves out with that one). But we’re not your ordinary group of sheep-le over here. No no, we’re some of the more savvy consumers of click-bait information. We use the Google too!! And what we (and many others probably) noticed early on, was that there was a discrepancy between Google’s little macro (powered by AP) and CNN’s classic US map dashboard (although it definitely is getting fancier). The numbers being reported were the same but a difference in information delivery paradigm caused an interesting bit of confusion in our household. Allow me to explain…

If a state had more Republicans votes at the time, CNN would shade it red (or blue for the opposing case) which gave the impression that the candidate had won or at least was "winning" the state. Fine, makes sense. However, Google took it one step further. They provided two "grades" of shading! A light color meant the candidate was leading the popular vote of that state whereas a darker won meant that the candidate had "won" the state. Tip of the hat for the addition of this clever, intuitive visualization feature.

So what’s the problem? Shouldn’t the candidate that is winning the state’s popular vote be the one whom the analytics-gods inevitably deem as the "winner" of that state? Not necessarily. Take Virginia for example. Sometime around 8:30pm EST Biden was losing by ~20%. As the CNN anchors were stewing about how Biden needed to make some sort of heroic effort to save this "blue" state from flipping (as if that’s how election night worked), Google’s map filled in dark blue signifying that not only was Biden winning Virginia… he’d won!! Well now as you can imagine this caused a bit of confusion around here and I decided to investigate.

Setting the scene

At around 8:30pm EST, state officials had counted 33% of votes and Biden was losing by 20%. However, the Google macro had shaded this state dark blue to signify an unquestionable Biden victory. Ok, what the heck? My mom felt that way and it wasn’t helped by the television pundits who kept the game-show illusion going with the rhetoric that "hopefully Biden pulls back in Virginia." The statisticians knew something that we as viewers were not being told.

Marbles again?

Mathematical modeling is all about simplifying the real world. In another post, we showed how quickly you can call an election if you simplify the model into hypergeometric equivalent (a bag of marbles). We pretended that instead of going to various polling offices, mail boxes, etc, the people cast their votes by bringing a marble of a specific color (red or blue) to one big-ol marble bag located in the state’s capital on election day. We showed that if the bag was well mixed, we would only need to draw a few thousand to make a really good guess as to what the composition of the bag was.

However, you can see how this is unrealistic because, in real life, there is more than one bag of marbles. In fact, there are 133 counties and over 2500 precincts in Virginia. If we were to use the one bag model, we’d make the egregious error of assuming that we could draw from any of these bags and make a general statement about the entire state.

Well why not? That’s how the information is displayed to us through the media? 33% of votes have been counted people! That’s well over our 10,000 marble threshold that we mathematically posited would be sufficient. Trump won Virginia folks…close the books!

So in lies the real work of a professional statistician: choosing representative samples is hard.

Marble Bags within Marble Bags…great

But let’s say that we needed to say SOMETHING even though we knew that our sample was biased. Well, what if instead of assuming a single bag of votes, we assumed that each county was a smaller bag within that bag. Yes, the state’s marble bag is actually a bag of county marble bags (the rabbit hole continues!).

Image by Author
Image by Author

But does this explain what happened Tuesday? Well, let’s look at the data. I took some numbers directly off the Virginia state’s official election reporting website sometime on 11/5 (votes should be in at that time). The final count at that time said that Biden had been awarded 53% of the counted votes. Recalling Tuesday night, CNN would’ve had us believing that this was some sort of heroic push but I think the statisticians would argue that it was inevitable. Let’s see.

Asking the data

Ok, let’s start by doing a quick thought experiment to see if we can recreate the situation we saw at 8:30pm on Tues when Biden was losing by 20% but was declared the winner. Let’s pretend that 10,000 votes are counted from each county (i.e. ~1.5 million of Virginia’s ~4.5 million total votes). The following is a plot of 30 randomly selected counties in Virginia and how they might’ve looked after counting their first 10,000 votes. (I’ve put a box around Fairfax County because it becomes important later – stay tuned.)

Image by Author
Image by Author

Tale of two Virginias

If I add up the votes from all the counties, we can see that Trump would have a commanding lead over Biden at this early stage (shown on the left below). However, the statisticians look at the early estimate of the individual counties above and realize that Trump has already lost. How?

What they might’ve done is they take the votes that they’ve already counted and guess the proportion of Biden votes contained within the uncounted, future votes. In fact, if we think we know how many people will vote we can use Bayesian Inference to infer a range of potential votes for each county (by inferring a range on p like we did with Illinois). Then just repeat the process for each county and add up the totals for the low and high estimate cases (right figure). And with that, you can see that even the unimaginably worst case scenario for Biden (labeled "low" below), he still is well above the red "line to win" drawn at 50% of the votes.

Image by Author
Image by Author

Ok, for those of you who followed all the way up until now you might still be asking: how could this be? Well, let’s just look at the raw results from the 30 most populous counties and I feel like the answer should be obvious. Check out how many votes Biden picked up in Fairfax County. In fact, he won the top 12 most populous counties in Virginia (in some cases by a landslide).

Image by Author
Image by Author

Of course, votes aren’t actually counted at the same speed and my little model I presented here isn’t actually the numbers from Tuesday night (8:30pm). But… it shows that while the public is being fed one story about Trump leading by 20%, the statisticians have already realized he doesn’t have a prayer of winning Virginia.

Well, this is all very interesting, but what about swing states? And, this is not doing a whole lot to convince me whether or not my vote counts. I’d argue that it does but this post is getting a bit long. I try to answer those questions in greater detail in other posts.

Thanks for reading! Would appreciate any feedback!

If you liked this article

Consider giving a clap (or 10?) so TDS will share it more readily with others

Check out my other case studies on the election:

  • Illinois (how many votes needed to call an election)
  • Pennsylvania (swing state drama statistical interpretation)
  • Texas (how swing-able is it really?)

Images:

[1] Google, US election results (2020), https://www.google.com/search?channel=fs&client=ubuntu&q=election+results


Related Articles