More Money, More Problems: Analyzing the Effects of PAC Money on U.S. Congress

Edmund Chitwood
Towards Data Science
10 min readNov 4, 2018

--

Summary

In 2010, the U.S. Supreme Court ruled that corporations and unions cannot constitutionally be prohibited from promoting the election of one candidate over another, in effect deregulating campaign finance. This decision in Citizens United v. Federal Election Commission was controversial and unpopular; 80 percent of Americans polled were opposed to the ruling. The outcome of Citizens United and the 2014 ruling in McCutcheon v. Federal Election Commission contributed to the current state of affairs in the U.S. where more and more money is pouring into congressional elections. The projected total cost of the 2018 midterm elections is nearly $5.2 billion dollars,¹ making it the most expensive ever.

This situation inspired a project in which I used natural language processing and machine learning techniques to model the topics of Tweets by current members of the U.S. Congress, and then analyzed how the amounts of PAC (Political Action Committee) money they receive to finance their campaigns affect what they Tweet about. What I found is that the more money members of Congress take from PACs to finance their campaigns, the less likely they are to Tweet about contentious political issues.² The following blog post puts this effort in context, provides an overview of the project and points to resources voters can use to inform themselves about incumbent members of Congress before voting day on November 6th.

Fast Facts

  • The projected total cost of the upcoming midterm elections is $5.2 billion dollars.
  • A tiny group (0.42 percent) of Americans contributed the majority (70.8 percent) of campaign contributions in 2018.
  • In 2015, 76 percent of Americans believed that money had a greater influence on politics than before.
  • Congress members spend 50 percent of their time fundraising.
  • Gallup polls show that Americans currently believe that the most important problem facing the United States is dissatisfaction with government and poor leadership.

Key Project Findings

  • The average U.S. Senator took over $2.5 million from PACs in during his or her last electoral campaign.³
  • The average member of the U.S. House of Representatives took over $600,000⁴ from PACs during the 2018 midterm elections.
  • The current members of Congress⁵ have Tweeted over 1.8 million times⁶ dating back to 2007.
  • Members of Congress whose Tweets are not politically contentious took an average of $80,000 more from PACs on during their last electoral campaigns.

Project Links

Congress on Twitter

Given the increasing amounts of money contributed to and spent on political campaigns, and the concern Americans have about the influence of money in politics, it should come as no surprise that Congress is unpopular; only 21 percent of Americans approve of the job Congress is doing. What do members of Congress have to say for themselves? Increasingly, they take to twitter to communicate directly with their bases. Senators Kamala Harris, Marco Rubio and Elizabeth Warren each have millions of followers on Twitter and have Tweeted thousands of times. Like many other members of Congress, they use their Twitter accounts as mediums to project their personalities and policy positions.

The role social media played in the Arab Spring, and the Black Lives Matter the Me Too movements, presents abundant evidence that political discourse on social media has a real world, tangible effect. Furthermore, there is a growing body of academia devoted the Tweets of U.S. politicians. Such efforts include analysis concerning the presence of policy agenda in Tweets by members of state governments; what Congress is doing on Twitter; and the relationship between topics discussed by politicians and topics discussed by their followers. These efforts partially informed and inspired my analysis.

Analyzing the Tweets of Congress

To begin this effort, I used Jefferson Henrique’s GitHub repository ‘Get Old Tweets’ to collect all the Tweets from the official Twitter accounts⁷ of all active members of the 115th Congress. Henrique wrote the ‘Get Old Tweets’ project in Python, and it enabled me to bypass some of the restrictions of the Twitter API. In total I collected over 1.8 million Tweets from 522 member of the 115th Congress.⁸ I then used pandas to perform exploratory data analysis and clean the data, taking steps like investigating the numbers of Tweets by Congress members as well as engagement (e.g. Retweets, Likes) with those Tweets, and dropping Tweets which were too short to contain much meaning.

(The Tweet in the Dataset with the Most Likes and Retweets)

The following map presents an overview of the data by state and by the party of the Tweeter. The sizes of the circles reflect the number of Tweets and the colors represent the political party of the Tweeter. The sizes of the circles loosely correlate with the population sizes in each state. For example, there are a relatively large number of Tweets by members of Congress in California, New York and Texas. However, there are exceptions to that correlation. Vermont Senator Bernie Sanders has sent a relatively high number of Tweets, and his home state is therefore over represented.

After cleaning and exploring the data, I preprocessed it by using a Python library called Natural Language Toolkit. The steps I took to preprocess the text included removing punctuation and formatting (e.g. initial capitalization of words), tokenizing each Tweet (i.e. splitting the text of each Tweet into lists of individual words), removing stop words (i.e. words like myself, these and only which occur frequently and have little semantic meaning), and stemming the words. In addition, I used custom functions to remove links, hashtags and mentions.

(Example Tweets Before and After Preprocessing)

The final step before performing topic modeling on the Tweets was to transform the shape of the preprocessed corpus of text. To do this, I used scikit-learn’s count vectorizer, which outputs a matrix whose columns correspond to the unique words in the corpus of text and whose rows correspond to each individual Tweet. This process enabled comparison of each Tweet against the entire set of Tweets. I then transposed the matrix so that it was in term-document format and ready for topic modeling.

Topic modeling is a machine learning/natural language processing method that relies on statistical means to derive meaning from text on the basis of the frequency of word occurrences at both the document and corpus level. There are a variety of methods to model the topics of text, including both supervised and unsupervised methods of machine learning.

To model the topics of the congressional Tweets, I used a Python package called gensim to perform Latent Dirichlet Allocation or LDA. LDA is an unsupervised learning method created by David Biel, Andrew Jordan and Andrew Ng (formerly of Google and now Chief Scientist at Baidu). LDA is a generative probabilistic model that assumes that a corpus of text can be represented by a given number of topics, and that each document therein can be represented by a subset of the overall topics. The topics that LDA identifies are represented by a number of words that frequently occur together in a document (in this case a Tweet).

LDA was the method of choice to model the topics of Tweets by U.S. political officials in the efforts I mentioned above. Gensim’s LDA allows for easy tuning of model hyper-parameters, including the number of topics it should extract from the text and the number of passes the model should make over the text to learn it. The most sensible representation of the Tweets of Congress included 50 topics, and I tuned the model to take 20 passes over the text to learn it; training time took roughly eight hours. I then abbreviated the raw, component words of each topic for ease of interpretation.

(View the full list of topics in this GitHub repository)

Introducing Campaign Finance Information

I next gathered campaign finance data from the Center for Responsive Politics’ website opensecrets.org, which it describes as, “the most comprehensive resource for federal campaign contributions, lobbying data and analysis available anywhere.” The Center for Responsive Politics sources its data from the Federal Election Commission, and I used the Open Secrets API to gather campaign finance information about all current members of Congress and looked at how the topics they Tweet about vary depending on how much PAC money they took during their last electoral campaigns.

Initially, I focused my analysis on the relationship between related topics and sectors. My hypothesis was that the more a given member of Congress took from a sector, the more (or perhaps less) they would Tweet about it. However, this type of analysis proved inconclusive, as no clear trends emerged.

But when I looked at the members of Congress who took the most and least money from PACs, I noticed something interesting. For example, Senators Pat Toomey, Rob Portman and Tim Scott each took nearly $6 million Dollars from PACs during their last campaigns, and the most common topics of their Tweets not at all politically contentious.

(Common Topics Include Meetings, Receiving Honors, and Congress itself)

On the other hand, the topics of the Tweets of Congress members who didn’t take any money from PACs during their last elections indicate clear differences. While there is overlap between the most common topics among the two groups, Representatives Jared Polis, John Sarbanes and Francis Rooney Tweet more about issues that are politically contentious. Among the topics they Tweet about most often are health care, Robert Muller’s special investigation and the economy. Topics of such substance are lacking among the previous group.

I further explored this relationship, and found that unfortunately these Congress members are indicative of a pattern. To arrive at this conclusion, I divided the authors of Tweets based on whether not the Tweet was about a topic that was politically contentious and calculated the average amount of PAC money received by members of each group, and what I found is that the more money a Congress person takes, the less they Tweet about politically contentious issues. On average, Congress members who Tweet about topics that are not politically contentious take $80,000 dollars more from PACs per campaign. This I presume is due to a conscious or unconscious effort to make themselves appear more palatable to their donor bases.

Conclusion

This effort was an attempt to investigate the effect of PAC money on U.S. Congress. Unfortunately, it appears that PAC money negatively correlates with the tendency of members of Congress to talk (or at least Tweet) about important political issues, those that matter most to their constituents.

This conclusion joins a chorus of voices claiming that our elected officials are too beholden to campaign contributions. For example, a Cambridge University analysis “indicates that economic elites and organized groups representing business interests have substantial independent impacts on U.S. government policy, while average citizens and mass-based interest groups have little or no independent influence.” Furthermore, Recent media coverage has been devoted to Congressional candidates who prominently and publicly refuse to take PAC money.

Midterm elections are on November 6 — just two days away — and one sure way Americans can affect what Congress is saying and doing is by thinking critically about the issues that matter to them getting out and voting for candidates who are not beholden to PAC donors but rather have the concerns of the American people at heart.

Epilogue

A huge thank you to Jefferson Henrique, who wrote such an excellent, robust repository to get old Tweets. Without him this effort would not have been possible.

Though I focused on PAC contributions in this analysis, I do not mean to suggest that individual campaign contributions are unimportant. Individuals have contributed over $1.6 billion during this midterm election cycle.

As an addendum to this effort, I created an interactive map which shows the amount of PAC money each incumbent in the House of Representatives received during his or her campaign in the lead up to the 2018 midterm elections.

(Link to Interactive Map)

Footnotes

  1. This total includes money PACs and individuals contributed directly to Congressional campaigns as well and expenditures made in support of but independent of those candidates.
  2. A politically contentious topic is defined here as one that about which there is common, perhaps partisan political disagreement. Contentious topics include “Human Rights” and “Republicans Repealing Obamacare.” Topics that are not politically contentious include “Stops on the Campaign Trail” and “Visiting Places.”
  3. Campaign cycles considered include those ending in 2014, 2016 and 2018— depending on Senate class. U.S. Senators serve 6-year terms, a third of which begin every two years
  4. Campaign finance totals are as of October 2018; the final figures may be higher.
  5. Due to resignations, deaths, special elections and other extenuating circumstances, the complete list of incumbents in the 115th Congress was something of a moving target. Where possible, I used the most up to date list of incumbents.
  6. Tweet total is as of September 2018.
  7. Where possible, I used the official Twitter accounts of members of Congress. Some members of Congress (e.g. Bernie Sanders, Cory Booker) have multiple Twitter accounts.
  8. Not all members of Congress have Tweeted.

Author Contact Info

If you’d like to talk data science or politics, please add me on LinkedIn or send me an email at edmunchitwood@gmail.com.

--

--