Can We Beat The Stock Market Using Twitter?
An Elaborate Guide On How To Beat Analysts At Their Own Game Using Twitter & Sentiment Analysis
Do People Care About Data From Twitter?
Twitter’s data licensing services earned it $500m in 2019, making up nearly 20% of its revenue.
Why?
Traders have realized that leveraging social media to get the fastest news impacting stock prices isn’t a theory anymore, it is a reality, and Twitter is offering a platform that can outpace even the most reliable news vendors.
How Can We Use Tweets To Predict Stocks?
In essence, we are using machine learning algorithms in order to gauge sentiment from tweets to predict whether or not a stock will move a certain way. For example, if there are more positive words than negative words in a tweet, our algorithm labels it as a higher score and predicts the stock price to move upwards.
What Are We Going To Train Our Model On?
To understand how Twitter sentiment can be linked to stock price, we need a rich dataset of Tweets of different companies to make an inference if it is possible to predict the stock price using online sentiment. Additionally, Twitter’s metrics such as likes, followers, and engagement can act as good indicators on the reliance and probability of reaction to a specific tweet regarding a company or industry.
Before explaining the steps, it is important to reiterate the fact that not even the best analysts are able to build models that surpass an R-squared value of ~4–5% even with thousands of predictors, so can we find a way to beat all of those using just Twitter?
The following steps explain our approach of data construction:
Step 1: Extract Data from Twitter
The dataset was downloaded from the website “followthehashtag.com”. It is a rich library of tweets of companies listed in NASDAQ 100 based on Twitter cash-tags.
The dataset is used by researchers across the world and contains around a million tweets. In this project, tweets of 4 companies were downloaded for the time period of March 28th, 2016 to June 15th, 2016.
Step 2: Calculate The Sentiment Score of Tweets
We used the VADER (Valence Aware Dictionary and Sentiment Reasoner) library to detect the sentiments of each tweet. VADER is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.
Positive: how optimistic and positive the tweet is
Negative: how pessimistic and negative the tweet is
Neutral: how neutral the tweet is
Compound: a proprietary measure that incorporates all the above
Sentiment Score: a product between the compound sentiment measure and the number of followers that the account has for each tweet.
This measure will allow us to capture the importance and impact of tweets by prioritizing Twitter users that have more followers.
Step 3: Group The Overall Sentiment Score By Day For Each Company
Step 4: Extract Data From Yahoo Finance
Using the API available in pandas to connect to Yahoo Finance, then download and extract the data for each company’s stock for the selected 74 day period.
Step 5: Capture The Volatility of Stocks
The daily stock prices were standardized in order to abide by the distance based algorithmic rules and capture the volatility rather than the absolute price change.
Step 6: Create a Buy / Sell Signal
We then proceeded to create a preliminary Buy and Sell signal that is based on whether an investor would have made a positive return during a specific day, in other words, what we are trying to predict.
Then, a Buy & Sell signal is calculated that is based entirely on the sentiment behind the Tweet.
Step 7: Run around 36 different machine learning binary classification models
The dataset was split into train and test, in order to train our models and objectively measure their performance. Since it’s a time series data, it is better to split the data based on a range of dates.
A series of diverse algorithms were computed to first predict the accuracy of whether the stock’s price is prone to increase or decrease given the sentiment, and second to measure the magnitude of price change.
Models run were KNN, Logistic Regression, Decision Tree, Random Forest. SVM and ANN.
Step 8: Predict The Stock Price
A complex formula based on the model trained on the sentiments of the Tweets is used to predict the stock price.
Using Our Model To Try & Make Money
Predicting EA, T-Mobile, Vodafone, & Cerner’s Performance
We decided to do our analysis on diverse companies that would theoretically have a variety of tweets written about them.
Choice 1: Electronic Arts (EA)
We first chose EA, a video game maker, since a lot of people will be very vocal about how they feel about their video games on Twitter.
Choice 2: Telecom Companies
Moreover, Telecom companies were chosen (T-Mobile and Vodafone) because we believe that their clients tend to communicate their complaints online and therefore would express a strong sentiment.
Choice 3: Cerner (Healthcare)
Finally we analyzed Cerner’s tweets, an American supplier of health information technology solutions. The healthcare tech domain usually tends to publish information (whether it is innovative positive information or controversial negative information) that the public reacts to.
The Verdict
The results of the first analysis are summarized in the table below:
As observed, the graphical based algorithms have shown far superior performance in foreseeing the price movements, in comparison to the decision based algorithms. Going as far as averaging an accuracy score of 72% for Vodafone & T-Mobile, which is remarkable in context of the stock market.
Electronic Arts (EA)
Vodafone Group Plc (VOD)
T-Mobile US (TMUS)
CERN
Wisdom of Crowds
Overall, the stock market revolves around who can run the most intrinsically sophisticated and efficient machine learning algorithms that incorporate as many predictors as possible, however, we were able to outperform all of those predictors through using the wisdom of crowds.
Recommendations
Our final recommendations would be not to use our model in isolation, but depending on the risk appetite of the trader, try and corroborate information externally and preferably outside of Twitter if still hesitant on acting on the recommendations.
If there is a general consensus from several sources that the market is going to converge in a certain direction, then there is reasonable assurance that our model’s recommendations are correct.
Our model helps individuals make informed investment decisions through leveraging the power of the community, and is a perfect supplementary tool to any trader, whether they are a beginner or professional.