Throughout my many years in math education as a tutor, TA, and teacher, one thing that stood out to me was that my students consistently had strong feelings about math. I wondered if this phenomenon extended beyond my students. This led me to the questions, how do people talk about math? And, can I use data to explore topic trends? My expectation going into this project was that I was going to see a lot of hate for math. While that was in fact the case, I also uncovered a few topics that I was not expecting.
Data Collection and Preprocessing
I started by scraping 9,000 tweets that had the keyword "math" using twitter API, tweepy. For preprocessing, I used googletrans to translate any non english tweets. Next I removed any non english characters that did not get translated, converted all the characters to lowercase, and dropped duplicates tweets. Finally, I removed numbers, special characters, and emojis from the tweets.
Unsupervised Learning
For vectorization and topic modeling, I considered CountVectorizer and TF-IDF Vectorizer with NMF, LSA, and LDA. I found that TF-IDF Vectorizer along with Non-negative Matrix Factorization (NMF) gave me the most interpretable topics.
Topics
I found roughly 14 clearly interpretable topics. As I mentioned, quite a few were about discontent for math. Including being bad at it, dislike for the teacher, and just hating math in general. Here are those topics with common reoccurring words, and some of the top tweets within that topic.
Bad at Math (common terms: ‘bad’, ‘grade’, ‘really’, ‘fumbles’)
Math Teacher (common terms: ‘teacher’, ‘grades’, ‘test’, ‘school’)
Hate Math (common terms: ‘hate’, ‘crying’, ‘stressed’, ‘omg’)
The topics I did not expect to occur were quite diverse. For example, I found a topic of love for math, a topic where people were advertising to complete other’s homework for payment, and a few meme formats.
Love math (common terms: ‘love’, ‘test’, ‘got’, ‘super’, ‘enjoy’)
Pay for HW (common terms: ‘pay’, ‘calculus’, ‘assignment’, ‘statistics’)
Colors to Subject (common terms: ‘science’, ‘english’, ‘red’, ‘blue’, ‘green’)
Anime Character Meme (common terms: ‘hes’, ‘reason’, ‘think’, ‘wake’, ‘character’, ‘breathe’)
I was especially intrigued by this one, nearly all of the tweets were the same except for a different anime character’s name in each individual tweet. I actually looked for examples of the original tweets on twitter, and here is what I found:
Among the topics I was not expecting to see, were that people use math to insult others. In these topics, I saw phrases such as "you did the math wrong." This was especially prevalent in election tweets where I saw many users implying "the other side" doesn’t know "basic" math.
Sentiment Analysis
Additionally, I used VADER to check the sentiment within the topics. Overall, most of the topics had a negative leaning sentiment. -1 being the most negative, 0 neutral, and +1 being the most positive. For example, the Hate Math topic had a mean sentiment score of -0.41.
Emojis
In a previous blog post I have mentioned that the Data Science bootcamp I attended, Metis, taught an iterative process for completing projects. For my first iteration of the project, I actually converted the emojis into their word descriptions with the hopes of being able to pick up extra semantic value. However, the emojis actually formed their own topic and made the other topics less interpretable. That’s why I opted to remove them in the last iteration.
I did however take a look at the most commonly used emojis in the tweets. Since my data comes from social media I knew there was information to be gained by considering emoji usage. I created a word cloud based on number of times the emoji appears.
Unsurprisingly, the most common emoji was "loudly crying face." However, I did not expect to see "face with tears of joy" and "rolling on the floor laughing" to be so close behind. Perhaps this can be attributed to the meme-y nature of social media.
Conclusion
In conclusion, I would say that my initial hunch was correct; the trends I saw with my students did generalize, at least to the twitter population. I found topics where people expressed general dislike for math, which was reenforced by the less than neutral sentiment score throughout the topics. However, I did find a few topics I was not anticipating, like loving math, paying someone to do homework, and a couple meme formats. Additionally, I found that some of the most commonly used emojis are ones that are commonly used to express laughter.
Full project code can be found here.