Clustering FEC Quarterly Campaign Contributions

Cluster job roles in FEC data to create animated bar races visualizing contribution by occupation

Justin Herman
Towards Data Science

--

Looking Into FEC Individual Level Fundraising for Democratic Party

Polls get plenty of attention from week to week, but the news generally lacks deep analysis of the FEC filings. The filings are rich with self-identifying donor information including job position, industry, and location. In the 2020 primary almost all candidates have sworn off contributions from lobbyists. On the surface, this is a noble attempt by democratic candidates to prove they aren’t influenced by corporations. However, digging below the surface, plenty of corporate influence can be exposed with the FEC data. I will look exclusively at individual donations and attempt to unmask corporate influence.

Limitations

The FEC mandates campaigns only report donations over $200; therefore the data is largely incomplete. The aggregated Q1/Q2 stats are printed below. In terms of actual funds raised, Bernie Sanders should lead the candidates with nearly forty million in contributions. Unfortunately, this data is not only incomplete, but due to the filing requirements it is biased towards larger contributions. Considering I am attempting to identify corporate influence, this actually can be beneficial. Corporate donations are likely to be larger and therefore captured well in the data. For an up to date look at the aggregate numbers see politico. I may refer to these aggregated numbers below.

Looking at the above data, Bernie and Warren have a significant disadvantage in the average amount raised from individual contributions. Biden and Gillibrand get boosted by large contributions. Buttigeg has been touted as having a strong individual donor base. While it’s extremely speculative without all the data, it seems these individual donors are donating large amounts to Buttigeg (average of $330) amounting to nearly 17 million of his total 31 million raised. Such a high average donation casts doubt on the idea of a strong grass roots campaign. Strong small donor driven campaigns, should have a large gap in the ratio between Politico’s total donations received and the FEC individual reports amount received. For comparison Sanders has raised 46 million according to Politico and only 8 million via FEC data. Overall, it’s not that simple as Politico numbers include PACS that have donated several million to the campaigns. Analysis of any gap between Politico and FEC data would need to exclude such donations.

Alternative Approach

Act Blue is a large contributor to political fundraising and one could use their FEC data exclusively to compare the candidates. Doing so would likely paint a different picture of how Americans as a whole donate to political campaigns. However, the purpose of this write-up is to explore corporate donations. The ways in which more connected individuals donate differs from the simplicity of making an Act Blue donation from your computer.

Methodology

Within the FEC reports, self identifying data leads to a loss of information due to a lack of standardization. Looking at the FEC data you would find many slightly diverging examples of the word CEO. (Chief executive officer, chief officer, Chief executive etc). Using regular expressions I bundle all these roles into a singular role of CEO, capturing a significant amount of “lost” data. From here, I go deeper looking at other self-reporting words(Founder, Executive Director and President etc). These titles are also representative of those in charge of their organization, thus are equivalent to CEO’s.

Example of Clustering

On the left, the exact term CEO shows up 2902 times in the current FEC reports, however, the term CEO shows up 3533 times throughout the data. By labeling all entries in the table on the left as CEO, we increased our capturing of CEO’s by over 20%.

The cool thing is we can expand from here. I add in similar terms like President, Executive Director etc and we have 4459 additional respondents that represent the actual role of CEO. We now have over 7500 respondents identified as CEO. Nearly 300% increase from our original count of 2902

Expanded CEO Cluster

I bundled Executive positions and c level executive positions as seen below.

Clustered executives on the left and clustered C-level executives on the right

When we cluster all of these executive level positions into one category, we have a group of nearly 20k executives(700% increase) from different occupations that we can examine further for deeper relationships in donations.

Quick Validation

In the table below the bottom row represents the distribution of donations in the data. The first two rows represent CEO donation distributions in our original data and the clustered data.

Distribution of Donations

Eyeballing the above distributions there is no need to run any statistical tests. The clustering is clearly capturing the CEO population. It’s also clear, that the CEO population has significantly different donation distributions than the overall dataset. CEO have a mean and median donation of ($705,$250) versus average person having mean and median of ( $201,$38).

Some readers may be confused that the min is a negative number. Individuals have a cap of 2800 that can be donated to a single candidate. A min of -2800 is a reimbursement from the campaign to individuals who have accidentally broken campaign finance law.

Animated Races

Build bargraph races that track donations to candidates

Donations Races by CEO(left) and All Executive Positions (right) measured in millions. Bar color ranges from Blue to Red to represent left and right ideology within Democratic party

Cumulative Results

Biden doesn’t enter the race until Q2, and quickly he makes up for his donations gap. Biden, Harris, and Buttigeg are the clear favorites for corporate level donors. Booker, Klobuchar, and Gillibrand make up the next wave of candidates with the rest of the field (Inslee, Warren, Tulsi, Bernie, Castro and Yang) all seeming to draw little attention from corporate money.

Sanders and Warren are among the top three candidates currently and are receiving very little corporate funding. It will be interesting to see come Q3 reports, if this trend holds. As a casual observer of the election, I have a suspicion that despite Warren’s progressive agenda, she is going to see a significant rise in corporate donations in Q3. I believe corporate interests may want to diversify away from Biden and may soon lose faith in Harris/Buttigeg/Booker.

Looking at other Clusters

I went deeper and clustered other job fields. This area of analysis would be more apples to apples if we only used the Act Blue FEC data as that would likely better reflect overall donation figures. However, here are some of the tentative results

Cumulative Donations by occupations measured in millions(Please see scale differences)

Note the scales are all different for these graphs. Harris does extremely well with retired donors. Biden does extremely well with lawyers and those in the legal field. Purely based on assumptions of identity, I would have expected Kamala as a lawyer to have raised more from the legal field and Biden as a senior to have performed better with the retired.

Bernie has a commanding lead with blue collar workers, however, in this portion of the data Bernie has only received $150k in donations from the blue collar cluster. This highlights the danger of using this incomplete FEC data. Blue collar workers are likely driving Bernie’s fundraising lead, yet they aren’t donating over $200 and therefore not showing up in the FEC data. Overall our data is missing millions of individual donations.

Assumptions

I make several categorizing assumptions in my data manipulation operations. The regex approach I took to clustering these groups will include examples that are incorrect. Regular Expressions are a brute force attempt to categorize the occupations algorithmically. However, as I have tracked the original job occupation, overwhelmingly these job titles are correctly identified. My aggregated dataframe is hosted on my website, I have categorized all job titles that have been changed with a binomial field to track them. Please feel free to look at the assumptions I made and the false captures in the clusters. if you would like to take this on in your own analysis, or perhaps develop your own clusters, please feel free to do so as well. The original unedited job titles are present in the dataframe.

Conclusion

The FEC data is a powerful tool in identifying corporate power structures and how they are influencing the election. Banning lobbyists forced corporate influence to discover new nefarious relationships with the Democratic Party. Many corporations have turned to Super PACS for exerting influence on our election process post Citizens United. However, it’s clear that corporate level executives are still donating large amounts of money to their preferred candidates. While this story was more about visualizing corporate level donations, at some point I will attempt to use this data to develop models to determine what characteristics lead donors to donate to each candidate.

To see the code for the above article, please visit my github repository. The repository includes a notebook file and Rpubs link which serves as a walk through all the behind the scenes analysis in R

To see my other projects please visit my website . If you would like to contact me, please reach out to me on Linkedin

--

--