Pulling Tweets into R

Tutorial for linking Twitter’s API

Jason
Towards Data Science

--

Photo by Kon Karampelas on Unsplash

In this tutorial, I will help show you the steps you need to take to set up a Twitter Developer account, so you can utilize their API, then pulling tweets directly into your R Studio environment.

1. Apply for a Twitter Developer Account

You’ll need a Twitter Developer account since it gives you access to a personalized API key, which we will need later on. This key is unique to every account, so you mustn’t share it.

Visit Twitter Developer and create an account. You can use the same login credentials to your personal Twitter account if you’d like.

You will need to apply for an account, so make sure to follow the steps. It may take some time to set up the account, as there are a few questions you’ll need to answer. Most questions require a decent amount of input, so answer it to the best of your ability.

If you are stuck, this is an excellent video on how to apply.

Setting up a Twitter API developer account

Apply for a Developer account

2. Create a Twitter API App

It could take a couple of hours or days to get approved for a Developer account. It depends on how good your answers to the application were. Sometimes they may reach out to you to get more information.

After approval, head to the top right of the navigation bar and select apps. Here you will want to create an app that is part of generating an API key.

If you’re lost, this is a great followup video for the API key.

Creating a Twitter API App

Now that you’re all set up, let’s get started in R Studio!

After creating an app, you should see your first app on your dashboard.

3. Installing/Loading twitteR package

twitteR is an R package that provides access to the Twitter API. Most functionality of the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to daily interaction.

install.packages("twitteR") #install package
library(twitteR) #load package

4. Linking API into R Studio

For this step, you will need to access your Twitter Developer account and navigate to the app you created. If you haven’t done so already, generate new API keys and Access tokens.

Twitter App Keys (Mine are masked for privacy)
consumer_key <- 'XXXXXXXXXXXXXX'
consumer_secret <- 'XXXXXXXXXXXXXX'
access_token <- 'XXXXXXXXXXXXXX'
access_secret <- 'XXXXXXXXXXXXXX'

Using R, copy the code above and paste your keys into their respective variables. (Include the quotation marks)

5. Setting Up Authentication

Using the function setup_twitter_oauth() to link your keys to Twitter’s API.

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

After running the code above, you will be given a prompt. It is your preference whether you use a local file to cache the access, but I usually do not. If you do not want to use a local file, enter the number 2 and press enter.

6. Pull Tweets

Now that we have finished all the steps for setting up the API, we can now pull tweets. Let’s breakdown the function below.

virus <- searchTwitter('#China + #Coronavirus', n = 1000, since = '2020-01-01', retryOnRateLimit = 1e3)virus_df = twListToDF(virus)

The first line of code uses the searchTwitter function. We pass in as many #hashtags that are relevant to our search, in this case, #China and #Coronavirus.

It is currently February 4th, 2020, at this time, the coronavirus has caused world panic as it spreads to more countries outside of China.

  • The n argument is the number of tweets we would like to pull.
  • since — is the time frame you want the tweets
  • retryOnRateLimit — indicating whether to wait and retry when rate limited. This argument is only relevant if the desired return (n) exceeds the remaining limit of available requests

The last line of code uses the twListToDF function, which saves the tweets pulled into a data frame.

The data frame of all the tweets

Looking at all the tweets, we can see, there is a unique column that can help identify their tweet activity.

7. Next Steps

Now that you have your tweets saved in a data frame, it might be best to start analyzing and spotting any patterns or trends. An excellent way to start is to tokenize the tweets and extract insights from the words. Check out my tutorial for tokenization if you’re interested in learning more.

--

--