Analyzing #WhenTrumpIsOutOfOffice tweets

A step-by-step guide to cleaning and analyzing tweets in R

fylim
Towards Data Science
9 min readMay 30, 2020

--

As we get closer to the U.S.’s next presidential election, I wanted to know what people are thinking of the nominees. Will, the current President, continue his stay at the White House, or will we see a new U.S. President with a less angry Twitter rant?

Getting the dataset

I used the R package rtweet to download tweets with the hashtag #WhenTrumpIsOutOfOffice tweeted in March 2020. As a result, I was able to find more than 6000 tweets with the hashtag.

library(rtweet)# create token named "twitter_token"
twitter_token <- create_token(
app = appname,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_secret = access_secret)
#download tweets into csv files
tweets <- search_tweets(
"#WhenTrumpIsOutOfOffice", n = 18000, include_rts = FALSE)
df <- apply(tweets,2,as.character)
write.csv(df,"csv file path" )
#read the csv file
text <- read.csv("csv file path", stringsAsFactors = FALSE)

Cleaning the dataset

--

--