Having fun with your Twitter using Python!

Akshay Balakrishnan
Towards Data Science
8 min readJan 4, 2019

--

Have I grabbed your attention with this clickbaity title?

I hope so, anyway.

Today, I am just going to show a few things here. One, the amazing ability to crunch and derive a lot of information from your Twitter profile and how you can use it to improve or boost your online presence. Secondly, to showcase some powerful Python libraries which can be used to get lots of things done.

So let’s just jump into it!

Why Twitter?

Recently I have been on a tweeting spree, because it’s vacation time for me and I have nothing better to do. So I notice Twitter, in addition to letting me know how many likes and retweets I get for each tweet, tell me about something called impressions and engagements.

Engagements: the total number of times a user interacted with the tweets you sent during the selected date range. Interactions include retweets, replies, follows, likes and clicks on links, cards, hashtags, embedded media, username, profile photo or tweet expansion.

Impressions: the number of impressions on a tweet sent in the selected date range. An impression is the number of times a tweet appears to users in either their timeline or search results.

Unlike Facebook (which does offer great analytics and information for Facebook pages to be fair), Instagram and other social media, it is easier for me to get a treasure of data from my own profile. This doesn’t even take into account the powerful Twitter API which you can utilize to get insight on other’s tweets!

You may ask, where is all of this data you speak of?

The powerful Twitter Analytics- how I got my data!

As those who have tried to use tweets for purposes like sentiment analysis and all know, tweets are a really useful source of data to perform manipulation and extract information out of. The most obvious, like I have mentioned before, is sentiment analysis- trying to determine whether a tweet is positive in nature, or negative, or just neutral in nature. It is clear from the large traffic on the site, the amount of people who actively tweet daily, and the fact that they make their tweets available in the public domain and can be pulled (you can actually make your profile private but very few do) that enable people to use these tweets to understand many things. By utilizing Twitter’s API allows you to do queries like pulling every tweet about a certain topic within a time frame, or pull a certain user’s non-retweeted tweets.

There are lots of ways of getting the tweets in this data extraction phase. You can utilize the tweets present in the NLTK library. You can also use the Twitter API to extract tweets, but I want to make this phase as less of a hassle. So I decided to just analyze my own tweets, which I can extract very quickly because Twitter is very nice to us.

So first let’s see where Twitter has stored this information nicely for you!

First go to the Analytics section under the menu as shown:

It redirects you to a special page which shows a lot of information already. Right off the top it provides what you have been doing over the last 28 days and how your activity is compared to the 28 days preceding the present time period taken into consideration:

As you can see they also provide information for each month, and your top tweets, top mentions as well as a summary of information for the month.

You can also check how your tweets have fared, as well as graphs of how your tweets are doing:

Now, how do we extract information from this in a way that we can process it using your usual Python tools? Thankfully Twitter has this thing called ‘export data’ which conveniently packages all of this data in a nice CSV file and parcels it to your computer, where you are free to open the box and do what you wish! Click on the Export data button you see there, it will take all the tweets over the time frame you wish to analyze and send it to you as a CSV file.

Now it will arrive in such a manner:

Note that there are a lot of parameters or columns which are not visible here like no of likes, retweets, engagement rate, and so on. But the point is that for many activities, getting a good dataset is a challenge, but Twitter has given us such neat and organised data for us to use. So props to Twitter on that front!

Let’s bring Python into this!

I would be lying if I said if I am some sort of expert:(spoiler, I am not!). But I will just do a couple of things here, which shows anyone can actually do something with the tweets we have from my profile (do try with your own profile too, but make sure you do have enough tweets to analyse!). One thing is I am just going to see the kind of tweets I am sending out, whether they are positive, negative or neutral using a Python library called textblob. The other thing I will do is see what words I use very frequently and are prominent in my tweets. To add a bit of fun for the same, I can visually represent the results in a cool manner using something called ‘word cloud’. You shall see at the end why I am excited about this. So let’s get started!

Prerequisites:

I have used Spyder (Anaconda) to write my scripts here, but you can always use the Python shell itself or write the code and run the file (I am using Python3 in all examples here, because why not).

You will also need a few libraries to get started (don’t mind the definitions, they are mostly documentation based pointers:

a. textblob: TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

b. matplotlib:Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.

c. pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

d. wordcloud:As the name suggests, you can create a cloud of names, with the more used words being highlighted more.

For Ubuntu users you can use pip or pip3 to install all these Python packages, which is what I did: (package-name substituted with the relevant package)

pip install package-name

Sentiment Analysis Time!

It’s not even that hard anymore to set up your own program to do this! With the tweets being arranged neatly, all I had to change from the original file was change the column under the Tweets section to add a underscore between the two words (Tweet_text) to avoid having errors with the space in the program.

Import the required libraries:

import pandas as pd 
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

Now, import the file you need which should be in the same folder as the Python file you are writing into, using the read_csv function provided by pandas to read it into a variable called df. You can use different encoding schemes.

df=pd.read_csv(r"tweet_activity_metrics_TheCoolFanBoi_20181208_20190105_en.csv",encoding="latin-1")comment_words = ' ' #We will be appending the words to this varstopwords = set(STOPWORDS) #Finds all stop words in the set of tweets.for val in df.Tweet_text:   val = str(val) #convert all tweet content into strings   tokens = val.split() #Split all strings into individual components   for i in range(len(tokens)):      tokens[i] = tokens[i].lower() #Converts all the individual strings to lower case.for words in tokens:   comment_words = comment_words + words + ' '

Finally, this is where the WordCloud part comes into play:

wordcloud=WordCloud(width=1000,height=1000, background_color='blue', stopwords=stopwords,min_font_size=10).generate(comment_words)
#All of this is a single line

Feel free to explore all the parameters of the WordCloud function and tweak it as per your wish.

Finally let’s display this as output, by using the matplotlib library:

plt.figure(figsize=(10,10),facecolor=None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()

This is what I got when I saved this file and ran it on Python:

Admittedly, not the best background color, but you can see a lot of words highlighted with different fonts. Feel free to analyse my personality from the words I use in my tweets. Although I will just say words like ‘joelvzach’, ‘picklehari’, ‘pallavibiriyani’ and ‘guyandtheworld’ are all Twitter handles :)

Playing with sentiment analysis

I am not going to take too much time here, because I am a noob at this just like y’all. But its not that hard to get started, again due to the great work done by programmers in developing efficient libraries which have made it easier to get started with this.

All you need to do is this:

import pandas as pd 
from textblob import TextBlob
df=pd.read_csv(r"tweet_activity_metrics_TheCoolFanBoi_20181208_20190105_en.csv",encoding="latin-1")
#comment_words=' '
#stopwords=set(STOPWORDS)
print('Tweet | Polarity | Subjectivity')
for val in df.Tweet_text:
sentiments=TextBlob(val)
print('---------------')
print(val,end='')
print(' ',end='')
print(sentiments.polarity,end='')
print(' | ',end='')
print(sentiments.subjectivity)

And you get this output:

So these are the tweets I have, as well as the rating of my tweets from -1 (most negative) to 1 (most positive) with regards to my sentiment. It also shows how subjective my statements are too from a scale of 0 to 1.

You must be thinking and sitting here, Natural Language Processing? Pfft. I could do that.

But here’s the catch: We just played with it on a very high level and abstract manner. Did we consider how we calculate the sentiment or subjectivity of a tweet? What algorithm did we use to go through the text? And so on.

This is the marvel of the textblob library. If you want to go deeper into this, I am not adding this to the article, but do check this article as part of the textblob documentation which tells us how the function works and how it utilizes the Naive-Bayes algorithm for sentiment analysis.

This concludes my short article providing a sneak peek into what I have been upto lately (no good, as it is clear). Do check out my Twitter (since I have shown a lot of it) and catch me there as I continue tweeting, because why not, right? Code here.

Also thanks to the documentation of all the libraries used as well as GeeksForGeeks for providing me good material to make this article happen.

Thank you for reading this to the end!

--

--