My Experience with Twitter Premium Full Archive API using rTweet

There are several aspects in accessing data from the premium API which are not common knowledge

Shreya Agarwal
Towards Data Science

--

Image source: Unsplash

Twitter’s Premium archive API allows for accessing tweets beyond 30 days and can go back as far as 2006. It’s an important data resource and is naturally expensive. One should be aware of certain things before they go ahead and start using it as every request is paid for and comes with a limited quota of requests.

The whole process of accessing data from premium API is documented well on Twitter’s developer page and in the documentation of R and Python libraries, but there are several aspects that are not well explained or have been left out completely.

For my project, I needed to access two to three month old tweets, so I bought the Twitter’s premium access for full archive. I used ‘rtweet’ package, which is R’s library that enables access to Twitter’s API. It’s well documented, and is probably the only R package that allows to access to the premium API (but it has painful flaws).

However, there are certain pointers that you should know before you begin to go down this route.

  1. Cost + Tax — The cost table given here does not include tax. I purchased 250 requests per month for $224 and paid $44 extra. Maybe it’s understood to many that the cost doesn’t include taxes, but they should say that somewhere, especially for people who might be applying for a grant/funding to pay this cost.
  2. Recent to old tweets— While framing the query for getting the tweets, the rtweet function search_full archive allows for entering the date range. The query looks like this:
tweet <- search_fullarchive("#beyonce or #katyperry)(lang:en or lang:hi)" , n = 3000, fromDate = "201912120000", toDate = "201912120000", env_name = "curate", safedir = NULL, parse = TRUE, token = token)

Twitter’s premium API gets the most recent time mentioned in the user query and then works its way to the starting time. This method doesn’t allow for a wider range of tweets throughout the day, but only for a very narrow span of time.

3. Tweets quota — the Twitter Premium API allows for accessing 500 tweets per request. And, you can get tweets in multiples of 500 as well. So, for instance, (this is my understanding) in the query above, I asked for 3000 tweets in one request, that would get counted as =3000/500 = 6 requests. One needs to take care of this maths as the subscription allows for a certain number of requests and a certain number of tweets, in my case, 250 and 125,000 tweets per month. If you run out of requests, you won’t be able to consume your tweets quota. And there’s a fair chance that in experimentation and query building, a few requests might get exhausted.

4. rTweet consumes more requests much faster — I have recently learned that rTweet package has a bug which makes it consume more requests than required. I am not sure if they have fixed it yet. But, I lost a considerable amount of tweets due to this bug. You can read about the bug here.

I didn’t find these points when I started using Twitter Premium API and hence I learned the hard way.

I hope it is helpful.

--

--