Can We Beat The Stock Market Using Twitter?

An Elaborate Guide On How To Beat Analysts At Their Own Game Using Twitter & Sentiment Analysis

Noah Mukhtar
Towards Data Science

--

(Pixaby — Pexels, 2020)
Photo by Yucel Moran on Unsplash

Do People Care About Data From Twitter?

Twitter’s data licensing services earned it $500m in 2019, making up nearly 20% of its revenue.

Why?

Traders have realized that leveraging social media to get the fastest news impacting stock prices isn’t a theory anymore, it is a reality, and Twitter is offering a platform that can outpace even the most reliable news vendors.

How Can We Use Tweets To Predict Stocks?

In essence, we are using machine learning algorithms in order to gauge sentiment from tweets to predict whether or not a stock will move a certain way. For example, if there are more positive words than negative words in a tweet, our algorithm labels it as a higher score and predicts the stock price to move upwards.

(Edwin Concubierta — Pymble, 2020)

What Are We Going To Train Our Model On?

To understand how Twitter sentiment can be linked to stock price, we need a rich dataset of Tweets of different companies to make an inference if it is possible to predict the stock price using online sentiment. Additionally, Twitter’s metrics such as likes, followers, and engagement can act as good indicators on the reliance and probability of reaction to a specific tweet regarding a company or industry.

Before explaining the steps, it is important to reiterate the fact that not even the best analysts are able to build models that surpass an R-squared value of ~4–5% even with thousands of predictors, so can we find a way to beat all of those using just Twitter?

Photo by Jamie Street on Unsplash

The following steps explain our approach of data construction:

Step 1: Extract Data from Twitter

The dataset was downloaded from the website “followthehashtag.com”. It is a rich library of tweets of companies listed in NASDAQ 100 based on Twitter cash-tags.

The dataset is used by researchers across the world and contains around a million tweets. In this project, tweets of 4 companies were downloaded for the time period of March 28th, 2016 to June 15th, 2016.

Step 2: Calculate The Sentiment Score of Tweets

We used the VADER (Valence Aware Dictionary and Sentiment Reasoner) library to detect the sentiments of each tweet. VADER is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

(Pixaby — Pexels, 2020)

Positive: how optimistic and positive the tweet is

Negative: how pessimistic and negative the tweet is

Neutral: how neutral the tweet is

Compound: a proprietary measure that incorporates all the above

Sentiment Score: a product between the compound sentiment measure and the number of followers that the account has for each tweet.

This measure will allow us to capture the importance and impact of tweets by prioritizing Twitter users that have more followers.

Step 3: Group The Overall Sentiment Score By Day For Each Company

Step 4: Extract Data From Yahoo Finance

Using the API available in pandas to connect to Yahoo Finance, then download and extract the data for each company’s stock for the selected 74 day period.

Yahoo Finance’s API — Stock Prices For 74 Day Period

Step 5: Capture The Volatility of Stocks

The daily stock prices were standardized in order to abide by the distance based algorithmic rules and capture the volatility rather than the absolute price change.

Capturing Volatility By Standardizing Daily Stock Price Change

Step 6: Create a Buy / Sell Signal

We then proceeded to create a preliminary Buy and Sell signal that is based on whether an investor would have made a positive return during a specific day, in other words, what we are trying to predict.

Then, a Buy & Sell signal is calculated that is based entirely on the sentiment behind the Tweet.

Adding Buy / Sell Signal Based on Sentiment Score

Step 7: Run around 36 different machine learning binary classification models

The dataset was split into train and test, in order to train our models and objectively measure their performance. Since it’s a time series data, it is better to split the data based on a range of dates.

A series of diverse algorithms were computed to first predict the accuracy of whether the stock’s price is prone to increase or decrease given the sentiment, and second to measure the magnitude of price change.

Models run were KNN, Logistic Regression, Decision Tree, Random Forest. SVM and ANN.

Step 8: Predict The Stock Price

A complex formula based on the model trained on the sentiments of the Tweets is used to predict the stock price.

Using Our Model To Try & Make Money

Predicting EA, T-Mobile, Vodafone, & Cerner’s Performance

We decided to do our analysis on diverse companies that would theoretically have a variety of tweets written about them.

Electronic Arts (EA), Cerner, Vodafone, T-Mobile

Choice 1: Electronic Arts (EA)

We first chose EA, a video game maker, since a lot of people will be very vocal about how they feel about their video games on Twitter.

Choice 2: Telecom Companies

Moreover, Telecom companies were chosen (T-Mobile and Vodafone) because we believe that their clients tend to communicate their complaints online and therefore would express a strong sentiment.

Choice 3: Cerner (Healthcare)

Finally we analyzed Cerner’s tweets, an American supplier of health information technology solutions. The healthcare tech domain usually tends to publish information (whether it is innovative positive information or controversial negative information) that the public reacts to.

The Verdict

The results of the first analysis are summarized in the table below:

Accuracy Scores — AI, Neural Network & Machine Learning Algorithms

As observed, the graphical based algorithms have shown far superior performance in foreseeing the price movements, in comparison to the decision based algorithms. Going as far as averaging an accuracy score of 72% for Vodafone & T-Mobile, which is remarkable in context of the stock market.

Electronic Arts (EA)

Vodafone Group Plc (VOD)

T-Mobile US (TMUS)

CERN

Wisdom of Crowds

Overall, the stock market revolves around who can run the most intrinsically sophisticated and efficient machine learning algorithms that incorporate as many predictors as possible, however, we were able to outperform all of those predictors through using the wisdom of crowds.

Recommendations

Our final recommendations would be not to use our model in isolation, but depending on the risk appetite of the trader, try and corroborate information externally and preferably outside of Twitter if still hesitant on acting on the recommendations.

If there is a general consensus from several sources that the market is going to converge in a certain direction, then there is reasonable assurance that our model’s recommendations are correct.

Our model helps individuals make informed investment decisions through leveraging the power of the community, and is a perfect supplementary tool to any trader, whether they are a beginner or professional.

LinkedIn

GitHub Code

--

--

Senior Consultant, Data Science at Deloitte | Masters in Analytics — McGill University | CFA Level 2 Candidate | https://www.linkedin.com/in/nmukhtar/