The world’s leading publication for data science, AI, and ML professionals.

Silly Stock Trading on Onepanel.io GPUs

Our CEO Mathieu Lemay recently wrote an article on what various popular AI models see in Rorschach inkblot tests. That was fun. It got me…

Our CEO Mathieu Lemay recently wrote an article on what various popular AI models see in Rorschach inkblot tests. That was fun. It got me thinking about how much we humans see signal when there is only noise. I wanted to see what happens when we use Deep Learning to predict a company’s stock market performance using the CEO’s profile photo, and trade stocks using the CEO’s astrological sign. We partnered with Onepanel.io to bring this idea to life on GPU.

Using AI to interpret horoscopes and generate alpha. Mary Kate MacPherson made the bot a logo and everything. The tagline is "Your Fortune Awaits!"
Using AI to interpret horoscopes and generate alpha. Mary Kate MacPherson made the bot a logo and everything. The tagline is "Your Fortune Awaits!"

Of course, these are terrible ideas. What you can learn from this is simple: Bad ideas result in poor model performance. More to the point, you can’t do machine learning until the data science confirms that the project makes sense statistically.

I’m going to walk you through this AI project from signup to project execution using the [Onepanel.io](http://Onepanel.io) platform, and that should give you a sense of the value we get from their solution. Just to give you a bit of background, I am pretty deeply in love with DigitalOcean, but they don’t have GPU instances. AWS is great but expensive and somewhat mechanical when you want to spin up a dev instance for deep learning projects in a distributed collaboration, even from an AMI. What we get from Onepanel.io, which you will see in this article, is a platform where we can collaborate easily across the team on a GPU project, without installing all the libraries and running them, and sharing keys, and so on. It’s an interface to awesome.

The code for this article is available in Onepanel.io here: https://c.onepanel.io/daniel-lemay-ai/projects/star-sign-trade-stocks/code

Getting Started: 10 steps to spinning up a GPU for your team

The first step to trading stocks based on horoscopes is to go through the signup process.

Don’t forget to click the link in your email that validates your email address. If you are prompted for a credit card, that’s to pay for the GPU you are going to spin up. There are instructions to do all this stuff, but TL;DR people like me just want to fly through and have a working environment with GPU libraries, source control, security, and so on.

Log in and make a project
Log in and make a project
Set the details of the project
Set the details of the project
Create a project workspace
Create a project workspace
Set the workspace details
Set the workspace details
The instance will take a few minutes to spin up, so note the time message and the status color
The instance will take a few minutes to spin up, so note the time message and the status color
While we wait, let's invite our team members to the project. No SSH key exchange required!
While we wait, let’s invite our team members to the project. No SSH key exchange required!
Great! Our instance is up. Let's run Jupyter. Notice that tensorboard and terminals are also available with one click in the LAUNCH menu.
Great! Our instance is up. Let’s run Jupyter. Notice that tensorboard and terminals are also available with one click in the LAUNCH menu.
Make a new notebook to show the GPU is working
Make a new notebook to show the GPU is working
Copy-paste example code from keras.io Running!
Copy-paste example code from keras.io Running!

These 2 lines show that the GPU is working:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
OK. It all works!
OK. It all works!

We now have a GPU instance at low cost, with all the right stuff installed. That was easy. We were able to share access to the project, so that we can get a collaboration going between team members. Now it’s time to use the system to do the project!

Using horoscopes for portfolio management

Let’s show how the relationship between horoscopes and stock trading is alluring but not at all meaningful. Even if we find correlations, they should be random and not causal. That’s because horoscopes are random, and random, although often elegant, predicts nothing. Let’s start by Investing in the stock market based on the horoscope of each company’s CEO, and see what happens… FUN!

dcshapiro/onepanel

Scraping horoscopes: Building a dataset

I used this code by Travis Riddle to get horoscope data from the New York Post. I just did a bit of rewriting to get it working in Python 3. I also changed the script a bit so that it scrapes all of the signs per day, instead of all the days for each sign one at a time. It makes the script easier to resume if the host gets angry and blocks my VM’s IP. Instead of using a fancy/expensive IP rotating strategy, I just let ‘er rip and see what happens.

First, I ran the script on a 3 day range. It worked well. Next I ran it on the date range January 1, 2014 to December 31, 2017. The results were saved as the results came in, just in case. It only crashed once. I resumed. Done. That’s 4 years of horoscope data. I am quite sure that this falls under the fair use clause, but for copyright reasons, I won’t post the dataset.

Here is a look at the raw horoscope text, publication date, star sign, and sentiment
Here is a look at the raw horoscope text, publication date, star sign, and sentiment
Horoscope sentiment is a bit more positive than negative, which makes sense for a feel good marketing thing
Horoscope sentiment is a bit more positive than negative, which makes sense for a feel good marketing thing

There are 17,186 rows of horoscope data: one for each sign for each day for 4 years (123654 = 17,520). So we missed about 7 days of horoscopes per year. I’m going to assume that those are days where there was no paper published, and so no horoscope. Now that we have the "I can see the future" data, let’s learn a relationship between the data and the market, and trade on this "information" daily.

Just as an FYI, Onepanel.io has a dataset section of the platform, where you can painlessly grab a dataset to train a model. It has stuff like MNIST and many many others, and generally sucks less than pulling the data in yourself for a common dataset.

All the data.
All the data.

You can also snap an S3 bucket or whatever onto a running instance, or add your own dataset to the system.

Our usual approach to interpreting text would be to convert the horoscope text into vectors using spaCy, or train a custom embedding model with FastText, and then train some neural model to predict stuff from that input data. But let’s keep it simpler. I used TextBlob to convert the text into a sentiment-based decision. It’s not exactly modern portfolio theory. To compute asset allocations for real asset management using deep learning, you need a pretty insane amount of GPU and know-how. For this project we decided to keep it nice and simple.

from textblob import TextBlob
def sentiment(horoscope):
    analysis = TextBlob(horoscope.lower())
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

To decide if we should trade, we measure the horoscope sentiment, and if it is positive we buy into a stock (or hold if we were already in), otherwise we sell. We allocate our money evenly among the stocks where the CEO’s horoscope sentiment is positive. We track the holdings to see our portfolio value over time. We compare the profit (loss) of the strategy to the equal weight buy and hold strategy for the portfolio, to see if our active strategy beat the market. Let’s be SUPER generous and assume that fees are $0, even though in reality fees and taxes play a huge role in active automated trading strategies.

Our data source gives a whole new meaning to non-traditional data sources used to make investment decisions.
Our data source gives a whole new meaning to non-traditional data sources used to make investment decisions.

Now, before we can jump into the graphs, we need to pick a portfolio and get the birthdays of the CEOs.

We used stocker to simulate the stock market. In the end, due to weirdness in stocker, we just pulled out the stock history dataset from stocker and used the numbers to do our own backtesting simulation. The list of companies (tickers) in the system is available here, and that was the starting point for finding CEOs.

Results

We simulated 5 years of daily trades, and here is what we found:

Psychicbot
Initial Investment: $100.00
Final Balance:      $200.34
Annualized Return:    19.0%

WE MADE $$$!? How did a horoscope-based trading bot make money?

Hold on....
Hold on….

So, that was supposed to fail. Why did it work? Well, let’s check if this is just the result of measuring during the biggest bull run in my lifetime. Let’s take a step back and compare the results to the benchmarks for these stocks, and for IVV. If we made money, but less than buy-and-hold for an equal weight portfolio, or the ETF for the market, then this was a money making strategy that made less than simply holding the market. This idea is called alpha. Logic says that we should not be able to make money from something that has no information in it, and so alpha should be 0 or negative. So, let’s have a look…

Equal Weight (EQWT)
Initial Investment: $100.00
Final Balance:      $177.75
Annualized Return:    15.5%

WE MADE MORE $$$ THAN EQWT!?

And this is how the IVV index did over the same period... See: calculator [here](http://www.investinganswers.com/calculators/return/compound-annual-growth-rate-cagr-calculator-1262) and growth rate formula here.
And this is how the IVV index did over the same period… See: calculator [here](http://www.investinganswers.com/calculators/return/compound-annual-growth-rate-cagr-calculator-1262) and growth rate formula here.

So… IVV grew annually at 9.86%, which is about 2/3 the equal weight result, and about half the returns for our horoscope sentiment bot.

So… Why does trading stocks with horoscope sentiment work better than traditional investing strategies?? Magic?

Let’s summarize where we are:

Annualized Return:
IVV                 9.9%  
Equal Weight       15.5%
Psychicbot         19.0%
A quick look behind the curtain.
A quick look behind the curtain.

Why did it work?

Answer: Bias + Luck + No Fees = Fake

Bias is a sneaky and subtle thing that can ruin a well planned experiment. The bias here is that we only included stocks in the portfolio where the CEO didn’t change for several years, and changes in CEO often mean problems at the company. That gave EQWT an advantage. We couldn’t have known back then that these companies would not have changes in CEO. So, we kind of cheated by accident, picking equities that would be stable caused us to pick ones that do well. The bias of this strategy is only relative to IVV. For EQWT, we are investing in the same equities, and so where our sentiment bot ends up above or below the line is basically a random variable.

Also, we assumed $0 in fees. That’s unrealistically low. In real life active strategies we can pay 50% to 100% of gross profit in fees. Sometimes more. Net revenue is what matters here because that net profit is reinvested, and so there is a compounding effect. Lower fees in an active strategy lead to much higher profits. By comparison, ETFs like IVV have very low fees, and strategies like EQWT re-balanced monthly also have very low fees.

Source: Chapter 3 of Narang, Rishi K. Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading. Vol. 883. John Wiley & Sons, 2013.
Source: Chapter 3 of Narang, Rishi K. Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading. Vol. 883. John Wiley & Sons, 2013.

So, should you believe in this astrology-based investing strategy? Absolutely not! Good luck does not mean good idea.

Bias is totally a thing (credit: xkcd)
Bias is totally a thing (credit: xkcd)

Sometimes the impossibility or stupidity of an approach is not as obvious as the astrology-based portfolio management scheme presented here. Sometimes it is SUPER hard to know that the approach has no merit. If you set out to prove a strategy that is false, sometimes you come across strategies that generate their alpha from bias, or simply from luck. Just imagine how many combinations there are of 12 stocks who’s CEO has the right birthday and DON’T make more money than EQWT and IVV when trading on horoscope sentiment. Even something simple like modifying the testing period can make the "alpha" evaporate. You can lie using numbers but the house of cards crumbles quickly. When the strategy moves to paper trading, the returns don’t materialize. Or, when the parameters are tested, the theory falls apart. Real portfolio management using deep learning (e.g. Investifai) goes through rigorous testing and re-testing of not only the numbers, but even the assumptions that the numbers are built on. Alpha models are about more than numbers; They are about ideas.

And so here is the simulation again, but this time taking into account transaction costs using zipline:

Portfolio value using zipline. Initial value is 1 billion dollars. When taking into account fees, we lose almost $1e15 (Also known as $15,000,000,000) before swinging up to $2e10.
Portfolio value using zipline. Initial value is 1 billion dollars. When taking into account fees, we lose almost $1e15 (Also known as $15,000,000,000) before swinging up to $2e10.

The graph above looks much more realistic. So, basically, since we can’t go 15 billion dollars into debt, the model lost all the money in the first year. Even though we start with a billion dollars, we lose it all. Everything after that is irrelevant. Yes, it eventually rides the market up on leverage, but that’s irrelevant weirdness, because there is no margin account that lets us still exist at that point. Now we can finally see that this strategy is a terrible a idea. Even with the bias of knowing which stocks will have no CEO change for several years, and the luck of generating excess returns without fees, this strategy loses a billion dollars.

Now you may be curious why the fees are so high. Here we go. If you buy a $100 stock by selling some other $100 stock, you pay fees when selling and buying. Let’s say fees are 1%, even though institutional investors have much lower fees. That means our moving $100 through a SELL and a BUY cost $1 + $0.99 for a total of 1.99 (ignoring quantization errors of buying stocks for actual dollar amounts). So we need to make 2% profit on the trade to break even. Even if fees are only a few basis points, the house is still raking the table, and so the strategy needs to do better than chance and with enough profit to cover fees in order to avoid going to $0 on fees alone.

Strategies that generate profit net of fees are the alpha models asset managers bank on. These models require lots of GPU compute power to run simulations, and so we should play around some more with these powerful GPUs.

CEO Faces to Company Performance

OK. So we showed that trading using astrology looks like a thing (it isn’t). How about some more predictive modelling? Now that we collected the dataset of CEO astrological signs, let’s get the data to train a model to predict the return of each company, using the image search results for their ticker symbol + "CEO". For example: "APPL CEO". We know there should be 0 chance that this works. Now let’s prove it by doing the experiment.

Image search result for "APPL CEO"
Image search result for "APPL CEO"
Gobbling up pictures from the interwebz...
Gobbling up pictures from the interwebz…

First we grab the images of CEOs using image search APIs. We can use Google’s " Knowledge Graph Search API" to get CEO names from company tickers. We can also use an image recognition API or a dlib-based library to check images for faces, and toss the photos with no faces.

Let’s see what we start off with:

Clearly the scraper got no results for some companies, and others have the wrong guy, like president Obama standing with some guy from the company. Sometimes the photo is just totally unrelated. We toss those out by filtering out images that don’t contain a face. Popular personalities jump into the folders because they are related to the same CEO and company keywords e.g. "Elon Musk", "Bill Gates", "Bill Ackman", "Warren Buffet" and so on. I ended up collecting about 7GB of images to sort through, after removing duplicate images. There is no real upper bound on how many images we can scrape. You just keep going until you don’t feel like it anymore. The number of image files was 39,262.

We collected price data for 2,305 companies, and 1,596 were worth more at the end of the testing period. 69% companies were worth more. This didn’t take into account splits or dividends. Also, please note that stocker pulls data from Quandl, and they only let you pull 50 queries per day, unless you have an account (the free kind). So, no account results in code stalling out. On we go…

We need to do this crunching of images into embeddings in a GPU. To upload your dataset to an S3 bucket go like this (with your real AWS keys):

# install AWS cli
pip install awscli
# set the AWS credentials
export AWS_ACCESS_KEY_ID=BLAHBLAHBLAHKEYGOESHERE
export AWS_SECRET_ACCESS_KEY=SECONDRANDOMSTRINGYTHINGHERE
# upload files in current directory to onepanel-lemayai-datasets
aws s3 sync . s3://your-awesome-dataset/

And download your dataset to an instance from the CLI like this:

# set the AWS credentials
export AWS_ACCESS_KEY_ID=BLAHBLAHBLAHKEYGOESHERE
export AWS_SECRET_ACCESS_KEY=SECONDRANDOMSTRINGYTHINGHERE
# download files in current directory 
aws s3 sync s3://your-awesome-dataset/ .

We could use Onepanel.io‘s jobs feature for these GPU image conversion and classification tasks. I discussed in the past how SageMaker is much better than standalone VMs because they only run for the duration of the task you care about. Jobs work the same way, but without locking you into the custom SageMaker libraries.

On a related note, SageMaker is not available on-prem, but Onepanel.io is working on an on-prem private cloud deployment option for this whole platform, which is essential for many of our enterprise clients (e.g. government, medical, and fintech) who cannot move their datasets to the public cloud.

Now for the fun part. We convert images into embeddings! As described on the keras applications page, we can convert images into embedding vectors.

Step 1 is knowing the names of the layers. We get that using this code:

From here we know the output layer we want is at the end, just before the actual classification task. If we pick the output of "flatten", we get a really big shape (25,088 features per image) and so we pick the output of the first fully connected layer at the output, that has only 4,096 features.

Finally, we can set the embedding vectors as input data (x) and the corresponding profit/loss price as output data (y) to train a DNN (f) to predict predict price from embedding vector. Each prediction takes the form y=f(x). We can be train on training data, and verify the results using testing data, all with the help of the test_train_split function.

This is an excellent example of binary classification, and a handy example is again presented right on the keras.io website, under "MLP for binary classification".

Here is the code to turn the images dataset into embedding vectors in a CSV file:

And here is the code for training and testing the binary classification model:

Finally, here are the results:

Why are there more TRUE than FALSE?
Why are there more TRUE than FALSE?

Why are there more TRUE than FALSE? We know it’s not any kind of set imbalance because we corrected for that in the code. If this result is true, then we can predict winners and losers in the stock market using CEO faces…What’s going on??

It turns out that our scraper scraped a lot of the same picture with small changes (e.g. Warren Buffet and Elon Musk as mentioned above), and this causes some of the testing and training data to be really similar. Removing exact duplicate images was not a strong enough filter to truly eliminate this problem in the data. Effectively having multiple images of the same CEO, and recognizing the CEOs between training and testing, is cheating. And so, again, don’t believe good results because they are good. Believe them because they make sense, and help you predict stuff.

If you are suffering through script and driver installs, or generally dislike the process of getting Machine Learning infrastructure up and running, Onepanel.io could be really interesting for you. New features are coming online on a regular basis. The solution is all wrapped into a simple web interface. It just works. And so, in conclusion, in this article you got a really detailed walk-through of a machine learning project from start to finish, including spinning up the server, connecting the team, collecting the dataset, doing some analysis on stock data, and finishing off with some predictive stuff. You saw that you should not believe investing strategies just because they make money.

If you liked this article then press the follow button, clap the clap thing, and have a look at some of my most read past articles, like "How to Price an AI Project" and "How to Hire an AI Consultant." Also, check out Onepanel.io.

In an upcoming article, I will present something we have been working on for quite a while, that helps enterprises to automate their analysis of unstructured reports during internal audits.

Until next time!

-Daniel

[email protected] ← Say hi. Lemay.ai 1(855)LEMAY-AI

Other articles you may enjoy:


Related Articles