Getting Started with Data Collection Using Twitter API v2 in Less than an Hour

An introduction to a search query using the Twitter API v2 and a demonstration using Python.

Laura O’Mahony

Published in

Towards Data Science

9 min readOct 22, 2021

Introduction

Social media’s ubiquity has made various social media platforms more and more popular as a source of data. With this rise of social media as a data source, data collection using APIs is becoming a very sought-after skill in many data science roles. Today we will use the Twitter API v2 to collect social media posts from the microblogging and social networking service Twitter.

Twitter now has almost 400 million active monthly users [1], meaning a huge volume of data is available to collect, most of which is public. In addition to this, the Twitter developer team recently rebuilt the Twitter API from the ground up, releasing the Twitter API v2 in the second half of 2020. This API is really well documented and easy to use making it easier than ever to utilize this rich data source.

This article introduces what an API is and documents the process of using the Twitter API v2, from gaining access to the API, to connecting to a search endpoint and collecting data relating to some keywords of interest. No familiarity with the Twitter API or any knowledge of APIs at all is required to follow this piece.

What is an API?

An Application Programming Interface (API) is a software intermediary that allows two applications to communicate with each other to access data. APIs are frequently used for every action you take on your phone, e.g. sending a private message or checking the score of a football game. Both of these use an API to access and deliver that information to your phone. An API is basically a messenger that takes your requests, translates them, and returns the response. Essentially, developers plug into APIs to access certain assets for end users. Of course, to ensure data security, an API only provides select data that the application programmers have made public. APIs generally require an API key to authenticate a request. The API documentation generally contains the necessary information for access instructions and requirements. Many APIs are even free to use. Oftentimes, developers can follow existing API documentation to build a URL in order to pull the data within a browser.

A web-based APIs that takes in a client’s request and returns data in response. Source: Author

Twitter API

The Twitter API is a well-documented API that enables programmers to access Twitter in advanced ways. It can be used to analyze, learn from, and even interact with Tweets. It also allows interactions with direct messages, users, and other Twitter resources. Twitter’s API also allows developers access to all kinds of user profile information, like user searches, block lists, real-time tweets, and more. Information on API products, use cases, and docs are available on the Developer Platform. Details on the Twitter developer policy are available here.

There are various APIs developed by Twitter that developers can use. These APIs benefit researchers and companies by drawing insights from Twitter data. However, it’s also suitable for smaller-scale projects such as small-scale data analysis, creating bots, and even creating fully automated systems that can interact with Twitter. In this post, we will use the Twitter API to pull some recent public Tweets matching a specified search query. The API product track we will use is freely available to anyone and allows up to 100 thousand tweets a month to be pulled.

Getting Access to the Twitter API

Before using the Twitter API, one must already have a Twitter account. It is then required to apply for access to the Twitter API in order to obtain credentials. The API endpoint we will look at is GET /2/tweets/search/recent. This returns public Tweets from the last seven days that match a search query and is available to users approved to use the Twitter API with the standard product track or any other product track.

For the purposes of this article, we will use the recent search endpoint, meaning we will only need access to the standard product track. The standard product track is the default product track for those just getting started, building something for fun, etc. The alternative product tracks are academic and business and have greater capabilities for those who qualify or are willing to pay respectfully.

The process of applying for the standard product track and obtaining the necessary credentials involves first having a Twitter account. One must then apply for a developer account by filling out the forms on this page. Note that the recent search endpoint returns Tweets matching the search criteria from only the last 7 days. A similar archive search endpoint GET /2/tweets/search/all has greater capabilities. It allows the user to return public Tweets matching a search query from as far back as Twitter’s first post in 2006. However, this endpoint is only available to those users who have been approved for the Academic Research product track. The academic research product track is an exciting addition to the Twitter API v2 which allows users to use the widest range of endpoints and pull up to 10 million tweets each month! A full list of the Twitter API v2 endpoints is available here. More details about features that can be used for academic research can be found here. It is possible to check eligibility for this product track here. The process for applying for an academic product track license is similar, but the application form is more detailed and must be approved by the Twitter developer team.

Once the application has been approved, one can finally use various API endpoints with the standard product track. To go about this, one must set up an app by opening the developer portal and choosing ‘create a new project’, filling out the required details, and lastly give the new app a name.

Source: Screenshot of author’s application on the Twitter Developer Platform

Once this is done, you will be navigated to a keys and tokens page. After you name your app, you will receive your API Keys and the Bearer Token (hidden in the screenshot below). These are necessary to connect to the endpoints in the Twitter API v2.

Make a note of your API key, API secret key and Bearer token. Once you leave this page you will not be able to see these again. However, if you lose these keys and tokens it is still possible to regenerate them. These are used to authenticate the app that is using the Twitter API, and also authenticate the user, depending on the endpoint. Note that all of these keys should be treated like passwords, and not shared or written in your code in plain text.

Making a Basic Request with the Twitter API

Now that all API access keys should be sorted, there is nothing left to do but test out the API! The first step here is to load your credentials. One way of doing this is using the command prompt (or another similar tool) to pass through the bearer token as an environment variable and a Juptyer Notebook (Available by installing Anaconda) for making requests and showing responses. Begin by opening up a command prompt and changing the directory to wherever you wish to save your script. In the command prompt, pass through the “Bearer Token” just created by the app set up by typing:

set BEARER_TOKEN=YOUR_BEARER_TOKEN

Next open Jupyter Notebook with the command:

Jupyter Notebook

Create a new Python 3 Jupyter Notebook file. Next, the first thing we will need to do is import some libraries.

Following this, we need to set up a connection to the Twitter API and access our credentials. For this, we will create a connect_to_twitter() function which retrieves the bearer token from the environment variable. In this step, we also pass the bearer token for authorization and return headers which will be used to access the API.

Now we are all set up to use the API! Twitter recently launched a new #ExtremeWeather Mini-Site [2]. This gives a great idea of the data insights available on Twitter. Therefore, in this example, we will pull some recent Tweets relating to the #ExtremeWeather conversation on Twitter. To do this, we must build the appropriate URL request for the endpoint and the parameters we want to pass.

The ‘recent’ part of the URL here is the endpoint and the query ‘ExtremeWeather’ is the keyword searched for. Equivalently, one can write the URL and separate out the query parameters like so:

The response from the Twitter API is the response data returned in JavaScript Object Notation (JSON) format. The response is effectively read as a Python dictionary where the keys either contain data or contain more dictionaries. The top two keys as seen in this response are ‘data’ and ‘meta’. The tweets are contained within ‘data’ as a list of dictionaries. Meta is a dictionary of attributes about the corresponding request, it gives the oldest and newest tweet ID, the result count as the ‘next_token’ field I will discuss at a later stage.

{
    "data": [
        {
            "id": "144557134867XXXXXXX",
            "text": "Tweet text 1"
        },
        {
            "id": "144557110192XXXXXXX",
            "text": "Tweet text 2"
        },
...
        {
            "id": "144555630795XXXXXXX",
            "text": "Tweet text 10"
        }
    ],
    "meta": {
        "newest_id": "144557134867XXXXXXX",
        "next_token": "b26v89c19zqg8o3fpds84gm0395pkbbjlh482vwacby4d",
        "oldest_id": "144555630795XXXXXXX",
        "result_count": 10
    }
}

The Tweets can be made into a data frame using the following code:

We can easily save it to CSV format if we wish:

Altering a Request with the Twitter API

Altering the query parameters the endpoint offers allows us to customize the request we wish to send. The endpoint’s API reference document details this in the ‘Query parameters’ section. A basic set of operators and can be used to alter queries. We can amend the query, start and end times for the widow of time we are interested in and the maximum number of results, we can also pull many additional fields to give further information about the tweet, author, place, etc. The following pulls 15 Tweets containing the keyword “ExtremeWeather”, which aren’t retweets and were created on October 12th, 2021 (within a week of the date the request was made).

The recent search endpoint can deliver up to max_results=100 Tweets per request in reverse-chronological order. Pagination tokens are used if there are more than the ‘max_results’ matching Tweets. The next page of results can be retrieved by amending the request by copying and pasting the ‘next_token’ field given in the previous result into the ‘next_token’ field instead of leaving it blank as above. A loop could be created to make requests to pull Tweets until all matching Tweets have been collected. There is a limit to the number of requests that can be made which is detailed here.

Conclusion

This article details a step-by-step process for collecting Tweets from Twitter API v2 using the recent search endpoint using Python. Steps from getting access to the Twitter API, making a basic request, formating and saving the response, and finally amending query parameters are discussed. This should enable you to get up and running with making search requests with the Twitter v2 API.

If you found this article helpful, feel free to share it with friends and colleagues who may be interested. I am on LinkedIn and Twitter if you wish to connect.

Happy data collection!

Image Sources

All images are my own created on diagrams.net or screenshots of my own application on the Twitter Developer Platform.

References

[1] Statista: Most popular social networks worldwide as of July 2021, ranked by number of active users (Accessed 04–10–21).

[2] Visualizing the global #ExtremeWeather conversation on Twitter (Accessed 04–10–21).