
Introduction
This is an Election Year and polling scene around the elections (both General Presidential and House/Senate) is heating up. This will become more and more exciting in the coming days, with tweets, counter-tweets, social media fights, and endless punditry on the television.
We know that not all polls are of the same quality. So, how to make sense of it all? How to identify trustworthy pollsters using data and analytics?

In the world of political (and some other matters like sports, social phenomena, economics, etc.) predictive analysis, Five-Thirty-Eight is a formidable name.
Since early 2008, the site has published articles – typically creating or analyzing statistical information – on a wide variety of topics in current politics and political news. The website, run by the rockstar data scientist and statistician Nate Silver, achieved particular prominence and widespread fame around the 2012 Presidential election when its model correctly predicted the winner of all 50 states and the District of Columbia.

And before you scoff and say "But what about the 2016 election?", you may be well-advised to read this piece on how the election of Donald Trump was within the normal error margin of statistical modeling.
For the more politically-curious readers, they have a whole bag of articles about the 2016 election here.
Data Science practitioners should take a liking to Five-Thirty-Eight because it does not shy away from explaining their predictive models in terms of highly technical terms (at least complex enough for the layperson).

Here, they are talking about adopting the famous t-distribution, while most other poll aggregators may just be happy with the ubiquitous Normal distribution.
However, going beyond the use of sophisticated statistical modeling techniques, the team under Silver prides itself on a unique methodology – pollster rating – to help their models remain highly accurate and trustworthy.
In this article, we analyze their data on these rating methods.
Five-Thirty-Eight does not shy away from explaining their predictive models in terms of highly technical terms (at least complex enough for the layperson).
Pollster rating and ranking
There is a multitude of pollsters operating in this country. Reading and gauging the quality of them can be highly taxing and fractious. As per the website, "Reading polls can be hazardous to your health. Symptoms include cherry-picking, overconfidence, falling for junky numbers, and rushing to judgment. Thankfully, we have a cure." (source)
There are polls. Then, there are polls of polls. Then, there are weighted polls of polls. Above all, there is a poll of polls with weights that are statistically modeled and dynamically changing weights.
Election 2020 Polling Field Guide – Pew Research Center Methods
Sounds familiar to other famous ranking methodology you have heard about as a data scientist? Amazon’s product ranking or Netflix’s movie ranking? Probably, yes.
Essentially, Five-Thirty-Eight uses this rating/ranking system to weight the poll results (highly ranked pollsters’ results are given higher importance and so and so). They also actively track the accuracy and methodologies behind each pollster’s result and adjust their ranking throughout the year.
How FiveThirtyEight Calculates Pollster Ratings
There are polls. Then, there are polls of polls. Then, there are weighted polls of polls. Above all, there is a poll of polls with weights that are statistically modeled and dynamically changing weights.
It is interesting to note that their ranking methodology does not necessarily rate a pollster with a bigger sample size as a better one. The following screenshot from their website demonstrates it clearly. While the pollsters like Rasmussen Reports and HarrisX have bigger sample sizes, it is, in fact, Marist College, which gets A+ rating with a modest sample size.

Fortunately, they also open-source their pollster ranking data (along with almost all of their other datasets) here on Github. And if you are only interested in a nice-looking table, here it is.
Naturally, as a data scientist, you may want to look deeper into the raw data and understand things like,
- how their numerical ranking correlates with the accuracy of the pollsters
- if they have a partisan bias towards selecting particular pollsters (in most cases, they can be categorized as either Democratic-leaning or Republican-leaning)
- who are the top-rated pollsters? Do they conduct many polls or are they selective?
We tried to analyze the dataset for acquiring such insights. Let’s dig into the code and the findings, shall we?
The analysis
You can find the Jupyter Notebook here on my Github repo.
The source
To start off, you can pull the data directly from their Github, into a Pandas DataFrame, as follows,

There are 23 columns in this dataset. Here is how they look,

Some transformation and clean-up
We notice that a column has some extra space. A few others may need some extraction and data type conversion.



After applying this extraction, the new DataFrame has additional columns, which makes it more suitable for filtering and statistical modeling.

Examining and quantizing the "538 Grade" column
The columns "538 Grades" contains the crux of the dataset – the letter grade for the pollster. Just like a regular exam, A+ is better than A, and A is better than B+. If we plot the counts of the letter grades, we observe 15 gradations, in total, from A+ to F.

Instead of working with so many categorical gradations, we may want to combine them into a small number of numerical grades – 4 for A+/A/A-, 3 for the B’s, etc.

Boxplots
Going into visual analytics, we can start off with boxplots.
Let’s suppose we want to check which polling method performs better in terms of prediction error. The dataset has a column called "Simple Average Error", which is defined as "The firm’s average error, calculated as the difference between the polled result and the actual result for the margin separating the top two finishers in the race."

Then, we may be interested in checking if pollsters with a certain partisan bias are more successful in calling the Elections correctly than others.

Notice something interesting above? If you are a progressive, liberal thinker, in all likelihood, you may be partisan to the Democratic party. But, on the average, the pollsters with Republican-leaning, calls the elections more accurately and with less variability. Better watch out for those polls!
Another interesting column in the dataset is called "NCPP/AAPOR/Roper". It "indicates whether the polling firm was a member of the National Council on Public Polls, a signatory to the American Association for Public Opinion Research’s transparency initiative, or a contributor to the Roper Center for Public Opinion Research’s data archive. Effectively, a membership indicates adherence to a more robust polling methodology" (source).
How to judge the validity of the aforementioned assertion? The dataset has a column called "Advanced Plus-Minus", which is "a score that compares a pollster’s result against other polling firms surveying the same races and that weights recent results more heavily. Negative scores are favorable and indicate above-average quality" (source).
Here is a boxplot between these two parameters. Not only the pollsters, associated with NCCP/AAPOR/Roper, exhibit a lower error score, but they also display considerably low variability. Their predictions seem to be steady and robust.

If you are a progressive, liberal thinker, in all likelihood, you may be partisan to the Democratic party. But, on the average, the pollsters with Republican-leaning bias, calls the elections more accurately and with less variability.
Scatter and regression plots
To understand the correlation between parameters, we can look at the scatter plots with regression fit. We use the Seaborn and Scipy Python libraries and a customized function for generating these plots.
For example, we can relate the "Races Called Correctly" to the "Predictive Plus-Minus". As per Five-Thirty-Eight, "Predictive Plus-Minus" is "a projection of how accurate the pollster will be in future elections. It is calculated by reverting a pollster’s Advanced Plus-Minus score to a mean based on our proxies for methodological quality." (source)

Or, we can check how the "Numeric Grade" we defined, correlate with the polling error average. A negative trend indicates that a higher numeric grade is associated with a lower polling error.

We can also check if the "# of Polls for Bias Analysis" helps in reducing the "Partisan Bias Degree" that is assigned to each pollster. We can observe a downward relationship, indicating that the availability of a high number of polls does help to reduce the degree of partisan bias. However, the relationship looks highly nonlinear and a logarithmic scaling would have been better for fitting the curve.

Are more active pollsters to be trusted more? We plot the histogram of the number of polls and see that it follows a negative power law. We can filter out the pollsters with both very low and very high numbers of polls and create a custom scatter plot. However, we observe an almost non-existent correlation between the # of Polls and the Predictive Plus-Minus score. Therefore, a great number of polls do not necessarily lead to high poll quality and predictive power.


…the availability of a high number of polls does help to reduce the degree of partisan bias.
Filtering and sorting top pollsters
Finally, we can do easy DataFrame operations to extract the list of top-rated pollsters with our custom filtering logic. For example, we can ask the question "who are the top 10 pollsters who have done more than 50 polls, have the best Advanced Plus-Minus score?".

And here is the result. Note that we did not sort on the ‘538 Grade’ or ‘Numeric Grade’, but because they are correlated to the ‘Advanced Plus-Minus’ score, most pollsters in this extracted list have an A+ or A rating.

Therefore, a great number of polls do not necessarily lead to high poll quality and predictive power.
Other factors
The dataset contains other parameters such as ‘House Effect‘ and ‘Mean-Reverted Bias‘ which also contains partisan bias information. They are surely used in the internal modeling of Five-Thirty-Eight’s predictions and can be explored further.
Summary
In this article, we showed, how to pull the raw data on the pollster ratings from the venerable Five-Thirty-Eight’s portal and write a simple Python script to do suitable transformations and visually analyze the data.
Again, you can find the Jupyter Notebook here on my Github repo.
Also, you can check the author’s GitHub repositories for code, ideas, and resources in machine learning and data science. If you are, like me, passionate about AI/machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.
Tirthajyoti Sarkar – Sr. Principal Engineer – Semiconductor, AI, Machine Learning – ON…