How To Create Your Own Hate Tweet Detector

A step-by-step tutorial on developing a machine learning classification algorithm for detecting hate tweets in Python

Steven Yan

Published in

Towards Data Science

7 min readAug 5, 2021

What are the consequences of posting a hate tweet?

On the tweet level, the following actions can be taken by Twitter:

Label the tweet as containing disputed or misleading information
Require you to remove the tweet before you can post again
Hide the violating tweet while waiting for its removal

On the account level, these actions are taken for repeat offenders or someone doing something especially egregious:

Require profile or media content be edited and make it unavailable until edited
Placing account in read-only mode preventing tweet, retweet, or Like capabilities
Ask for verification of account ownership to detect anonymous users with multiple accounts
Permanent suspension

Twitter’s different “Circles of Hell” are intended to ensure no user is ever penalized too harshly for inadvertently tweeting something offensive by mistake. So unless you are a repeat offender, you are unlikely to have your account removed or put into read-only mode.

Is hate speech even that prevalent?

The hate tweet business has moved well beyond the lemonade stand in recent years. Back in the 1st quarter of 2018, Facebook took action on 2.5 million pieces of content for hate speech, but in the last quarter of 2020, Facebook took action on a whopping 26.9 million pieces of hate speech. So about 9 million per month give or take….

Both Facebook and Twitter have already developed proprietary algorithms for detecting hate speech having seen the pernicious and insidious nature of hate speech, especially on social media platforms, in its ability to incite violence. Being able to develop such a detection model would prove useful in any startup, especially social media, or any company looking to monitor hate speech in an online forum like Reddit or in their intra-company communications.

Steps for Developing a Model for Deployment

Here is my rough outline of the progression from raw data to a model ready for deployment:

Environment Setup
Data Collection
Data Preparation
Model Training and Evaluation
Hyperparameter Tuning
Model Prediction
Model Deployment

You will find the entire code of this project in the following notebook:

Medium-Blogs/Hate_Tweet_Detector.ipynb at main · datascisteven/Medium-Blogs

Code Repository for my Medium Blogs . Contribute to datascisteven/Medium-Blogs development by creating an account on…

github.com

Environment Setup

To create the environment, put the environment.ymlin the folder you want to create the project and from the same folder in terminal run the following code:

$ conda env create -f environment.yml

Data Collection

The first step in developing any supervised machine learning algorithm machine is data collection, i.e. to procure a labeled dataset. I started with the Davidson dataset because it already contained the actual tweet text in the database and I could start work on the project, but a majority of the labeled databases only provides the tweet ID.

Since the dataset presents itself with a huge class imbalance, I undertook a thorough and fervent search for additional labeled datasets to bolster up my minority class, and stumbled upon this website: hatespeechdata.com. I ultimately decided on the datasets from the University of Copenhagen, Aristotle University, and the HASOC 2019 Shared Task dataset.

The original Davidson dataset consisted of 24783 tweets in total, where 23353 (or 94.2%) were labeled as non-hate and 1430 (or 5.8%) as hate. After incorporating additional minority instances into the dataset, I addressed the class imbalance by increasing the minority class instances to 7025 (or 30.1%) hate tweets with a total of 30378 tweets.

Twitter API

Let’s take a look at how to download your own set of hate tweets from Twitter.

For datasets with just tweet IDs, you will need to apply for a developer account to gain access to Twitter API to obtain the tweet text. Fair warning, it may not be an immediate process. I was contacted by Twitter to elaborate on the details of my project.

Once that is completed, create an application to obtain your API keys from Twitter and then place them inside a config.py file, whether they are assigned to variables, placed as values within a dictionary, etc.

Put this config.py file into .gitignore to keep your API keys private. To use the keys, import the module into your notebook:

from config import keys

If keys is a dictionary, we can easily retrieve the API keys with the different keys of the dictionary: keys[‘key_value’].

Twitter requires that the IDs be submitted in a specific format in batches of no more than 100, in string format, and that the IDs be separated by a comma with no space.

The first function creates a list of strings, where each string is a batch of 100 comma-separated tweet IDs:

This code was generated with the help of Postman Agent, and this second function produces a dataframe of the requested fields when the list produced by the previous function is passed into the function below as tweets_ids:

I have included my combined dataset combined.csv in the repo for your convenience in case you want to skip through these data collection steps.

Data Preparation

In order to use natural language in a machine learning algorithm, we have to convert the words into numbers, which is a form that the algorithms can recognize.

Tweet vs. Text

Let’s think about how a tweet is different from any piece of literary text:

May not be grammatically correct
May not be correctly spelled (i.e. abbreviations, joined words, repeating letters, such as FML, F$ckMyLife, Fuuuuuu$$ my life)
Use of special characters like # or @ and emojis

The goal here is to remove any special characters, usernames, URL links, retweets, anything that doesn’t add to the semantics of the sentence.

So here goes some shameless self-promotion, but in case you need a quick refresh of regular expressions, check out my posts: To RegEx or Not To RegEx, Part I and To RegEx or Not To RegEx, Part II.

Code for Preprocessing Tweets

I will briefly discuss my thought process in creating the function for preprocessing the tweets:

The first function is for lemmatization, which returns the root form of each word, i.e. “dimension”, “dimensional”, and “dimensionality” all get returned as “dimension”. The second function is to create a list of tokenized and lemmatized tweets with stopwords and words less than 3 characters removed using the module gensim to tokenize, which lowercases all the words automatically.

Components of Preprocessing Function

Retweets are reposted messages on Twitter, which contain the RT tag and the tweet text retweeted and sometimes with the username. I decided to keep the retweeted text, but remove ‘RT @username’ as the username adds no semantic value.
Tweets can often contain URL links to other sites, tweets, online media, etc. I removed both http and bit.ly links, the latter being a URL shortening service.
Unicode characters has an HTML code that begin with the following two characters &#followed by a number, which can refer to emojis, symbols, punctuation, etc. I removed those in addition to regular punctuation but remove the URL links before removing the punctuation.
Hashtags are words or phrases preceded by a hash (#) sign to identify it as relating to a particular topic, and I decided to keep the phrase since it often can have some semantic value.
I changed any 1+ whitespaces down to a single, as well as removing any leading and trailing whitespaces.
I separated any joined words such as AaaaBbbbCccc as indicated by inner capitalization.
I removed any repeated letters greater than 2 since there are no instances in English where a letter is repeated more than once.

What we end up with is a string of lowercased words either lemmatized or stemmatized, space-separated, and with stopwords, 1- and 2-letter words, special characters, usernames, punctuation, extra whitespaces, and URL links removed in each row of the dataframe:

“lemma space separate lowercase tweet”

Model Training and Predicting

Let’s create a pipeline with the following steps, TfidfVectorizer(), which combines the functionality of CountVectorizer(), which converts text into a matrix of token counts and TfidfTransformer(), which transforms that matrix into a normalized tf-idf representation.

We then fit the model on the training set, where X-train here is the preprocessed tweet text. We do not need to use fit_transform since TfidfVectorizer() has a transformer included:

Once we fit the model, we can go ahead and make a prediction and classify any tweet as hate or not-hate:

We have to pass the tweet through the preprocess() function and use the trained pipeline to make a prediction. This tweet ultimately gets the distinction of being categorized as a hate tweet.

Note: This was determined to be best by comparing among 6 different algorithms and performing hyperparameter tuning with GridSearchCV. The results are located in the following repo:

GitHub - datascisteven/Automated-Hate-Tweet-Detection: Developing a classification model to detect…

Author: Steven Yan This project builds off of my joint project with Ivan Zarharchuk, Twitter-Sentiment-Analysis…

www.github.com

What’s Next?

You could include additional labeled datasets to improve on the class imbalance.
You could make additional adjustments to the preprocessing function.
You could try different classification algorithms, including Doc2Vec or neural networks.

Im the next blog, I will take a closer look at the final step: deployment. I will take a look at the different ways we as data scientists could deploy the model.

Connect with me on Github, LinkedIn, or via email. For a look at my previous blogs and repos at the following website.