Mining Twitter on IBM Cloud Platform

Saving Tweets as JSON on IBM cloud storage

Dina Bavli
Towards Data Science

--

This photo is made from a photo in this ©twitter post and ©IBM Watson Studio Logo

Last year I started my multi-discipline research on Online Persuasion. I couldn’t find an adequate dataset of tweets. I needed to create my own dataset of tweets that will allow me to fulfill the research objectives.

Twitter free API is limited, from my experience it will pause for about 15 minutes after retrieving 4000 tweets or so. In order to overcome this, I used time loops. I needed a cloud platform to retrieve tweets for 50 queries with time loops.

The first attempt I’ve made retrieving tweets from twitter stream was on Watson — IBM cloud platform. Tweet Object renders as JSON so it’s only sensible to save them as such. Well — saving as JSON on IBM cloud storage is not the same as doing it locally. Moreover, no one asked how to do it before, nevertheless, answer it, so here is my answer.

A few good words about IBM Watson studio: it’s an extremely friendly platform for cloud computing, especially for first-timers. You can sign up here. A few clicks later you can work on a Jupyter Notebook. You can read here about how to get started with Watson Studio.

Connecting

When working on any cloud project one of the necessary steps is to connect your project to your project resources. As I said Watson Studio is friendly and in two clicks: one on the three vertical dots on the toolbar, the second click on Insert project token as seen in the following image:

Will create code as follows:

You may need to approve this as the admin of the project but Watson will guide you and redirect you.

When working on Watson you may need to install tweepy each time you reconnect. In order to retrieve tweets, those are the minimum necessary imports and installation:

For connecting to twitter API, you’ll need a twitter developer account and your account credentials. A way to access your keys and tokens is through Apps and App Details- where you can also regenerate them if needed.
As seen here:

Later insert your credentials as below:

After connecting your project and resources, and connecting to your twitter API, the next step is to retrieve tweets.

Retrieving Tweets

In the following code gist, there is an example of how to use tweepy with filters, queries, language, and the number of items to retrieve. You can filter tweets and use any queries you like.

Saving to JSON

The final step for mining twitter data is saving it.
Saving on Watson storage the syntax is:

project.save_data(data=json.dumps(all_data),file_name=`data.json`,overwrite=True)

Unlike the syntax for saving JSON file locally:
with open(`data.json`, `w`) as f:
... json.dump(all_data, f)

Loading data on Wason is also two clicks away:

It will insert a code that connects to your cloud storage, will add required imports, and read the data as a pandas data frame. It looks something like this:

Working with Watson studio is as easy as one-two-three:

1. Connecting to cloud sources and twitter API.

2. Retrieving tweets.

3. Saving.

Photo by freestocks.org from Pexels

Now, as for understanding twitter metadata? Well, that’s another blog post for another time.

--

--

Data Scientist | NLP | ASR | SNA @ Israel. ❤ Data, sharing knowledge and contributing to the community.