Intro to Support Vector Machines with a Trading Example

Let’s try to understand Support Vector Machines and how to implement them in financial markets

Published in

Towards Data Science

11 min readOct 30, 2019

Just like any other machine learning algorithm does, a support vector machine (SVM) takes data as input, attempts to find & recognize patterns, and then tells us what it learned. Support vector machines fall into the category of supervised learning, which means that it creates a function that will map a given input to an output. More specifically, an SVM is a classification algorithm.

Before we can start implementing trading algorithms and seeking alpha, let’s figure out how an SVM works

Maximal Margin Classifier

The support vector machine algorithm comes from the maximal margin classifier. The maximal margin classifier uses the distance from a given decision boundary to classify an input. The greater the distance, or margin, the better the classifier is at handling the data. On a Cartesian plane, the boundary can be thought of as a line. In three dimensional space, it is a plane, but after that, it becomes hard to conceptualize. This boundary can be better thought of as a hyperplane, specifically one of dimension p-1, where p is the dimension of the data point.
Our boundary, or hyperplane, is known as a separating hyperplane, because it is used to separate the data points into desired categories. In general, there are many hyperplanes that can separate a given data set, but the one we care about is the maximal margin hyperplane or the optimal separating hyperplane. This separating hyperplane is the one with the largest minimum distance from each data point in the training set. By using this hyperplane to classify a data point from the test set, we have the maximal margin classifier.

Source: Introduction to Statistical Learning

The line in the graph above represents the hyperplane. Notice that it completely separates all of the points in the blue and purple regions of the graph.

Now the maximal margin classifier works, to a degree. If you have a data set which cannot be separated by a hyperplane, you can no longer use this. Sometimes, you may run into a data that has more than two categories, which makes a linear boundary useless.

At this point, you have to consider your options:

You can base your classifier on the separating hyperplane as explained earlier. But the hyperplane doesn’t exist…so you have no classifier.
Consider a classifier that isn’t perfect, but it can work some/most of the time

Support Vector Classifiers

I like the second option too. By using a classifier that isn’t perfect, you can at least handle most observations, and introduce a level of adaptation to the model when it is presented with new data.

This evolution of the maximal margin classifier is known as the support vector classifier (SVC), or the soft margin classifier. Instead of being exact and not very robust in its classification, the SVC allows some observations to be on the wrong side of the margin and/or hyperplane (where the soft comes from), for the sake of getting classification mostly correct.

Without getting into too much math, the algorithm determines which side of the hyperplane an observation will lie on by finding a solution to an optimization problem that uses a tuning parameter, the width of the margin (which it tries to maximize) and slack variables.

The tuning parameter is used to control the bias-variance tradeoff. When it is small, the classifier fits the data well as the margins are small. In other words, low bias, high variance. A larger tuning parameter is the opposite. It allows for more observations to be on the wrong side of the margin allowing for high bias and low variance.

Slack variables in particular are pretty cool. They allow data points to be on the side of the margin or hyperplane. They are also used to transform inequalities into an equality. The values that the slack values take on can also tell us about the behavior of a given data point. If the slack variable for a given data point is equal to 0, then that data point is on the right side of the margin. If the slack variable is greater than 0 but less than 1, the data point is on the wrong side of the margin, but on the right side of the hyperplane. If the slack variable is greater than 1, the data point is on the wrong side of the hyper plane.

The main reason this optimization matters is its affect on the hyperplane. The only values that affect the hyperplane, and in turn how data points are classified, are those that are on the margin, or on the wrong side of it. If an object is on the right side of the hyperplane, it has not affect on it. The classifier gets its name from the former data points, as they are known as support vectors.

Finally, Support Vector Machines

The support vector machines builds on the optimization in support vector classifiers by growing the feature space by using kernels.

Kernels, similar to the previous optimization, uses a fair bit of math. Put simply, kernels tell us how similar data points are. By assigning weights to to sequences of data, it can identify how similar two points are, given that it has learned how to compare them. Kernels allow data to be processed in simpler terms, as opposed to being done in a higher dimensional space. More specifically, it computes inner products between all possible outputs of all of the pairs of data points in the feature space. By using kernels instead of enlarging the feature space, the algorithm can be much more efficient. It uses one function to compare pairs of distinct data points as opposed to using functions for original features in the data set.

Many different kernels exist including the RBF kernel, graph kernels, the linear kernel, polynomial kernel. For example, the linear kernel compares a pair of data points by using their bivariate correlation. The polynomial kernel attempts to fit an SVC in a higher dimensional space. A support vector classifier is the same as using an SVM with a polynomial kernel of degree 1.

Basically, the main goal of the Support Vector Machine is to construct a hyperplane, which it then uses to classify data. Despite generally being categorized as a classification algorithm, there is an extension of the Support Vector Machine used for regression, known as Support Vector Regression.

Support Vector Machines for Trading

Before I get into this application, this is by no means advice on how/what you should trade. That’s on you.

We’ll begin by gathering our data.

We’ll use a time period going back about five years, October 28, 2014 to October 28, 2019. The stocks that we will get data for are the components of the Dow Jones Industrial Average.

Yahoo Finance used to be really easy to get data from, but most packages no longer work, so we’ll also create a web scraper in the process.

The first thing we’ll do is import all of the packages we’ll need.

Then we’ll use the requests package to scrape the contents of this page on Yahoo Finance. The page contains the names of the companies that make up the Dow Jones Industrial Average, as well as their tickers. Next, we’ll use BeautifulSoup4 to make the information in Dow_Content searchable.

The lines above parse the data gathered from the web page and search for the bit of HTML code that corresponds to the table on the page. This can be found by right click on the area of the page, inspecting the element, and with a little investigation you can find the class name used above.

There will be two types of lines that the search will come across:

Lines containing the ticker
Lines containing the company name with no ticker

We don’t care for the latter, so when the loop finds them, it ignores that bit and moves on. A few string operations to trim the extra fat and we have our ticker. Each ticker is then added to a list for safe keeping.

Yahoo Finance uses a Unix time stamp in their URL, so we make use of the time package to convert our start and end dates to the desired format. It can take either struct_time (more about that here) or a tuple of 9 time arguments. We don’t really care for anything past the date here.

The ScrapeYahoo function takes four arguments:

data_df, your designated data structure to store the output
ticker, a string representing a given stock
start, a Unix timestamp representing the start date
end, a Unix timestamp representing the current day

It combines these with the base URL for Yahoo Finance and gets the data from the desired web page. Instead of processing it like we did earlier, we parse the JSON data from the page. Yahoo Finance uses cookies now, and simply using the HTML code will throw an error.

The lines after parse the content of the JSON data. Something that helped a lot while I was initially exploring the data set was the keys() method for Python dictionaries. It made traversing the JSON data much easier. You can read about it here

The Stock_Data dictionary will hold our parsed data. The keys in the dictionary will be the ticker of a given stock. For each stock, the function ScrapeYahoo will create a date frame containing open, high, low, close, and volume data.

We have historical price data, now what? Recall that the support vector machine is a classification algorithm. We’re going to attempt to create the features for our model with the help of technical analysis.

Technical analysis is a methodology that uses past data to forecast the future direction of price. In general, technical indicators use price data and volume in their calculations. The motivation for the indicators chosen come from the papers listed in the references section at the end of the article.

One very important thing to pay attention to before moving on: look-ahead bias. We already have all of the closing data, which is what will be used for calculations. In a real world scenario, the most you have is the previous day’s closing. We have to make sure our calculations don’t take in data that technically had not occurred yet. To do this, we will lag the data. That is, shift our data back one day.

Technical Analysis

We will make use of the talib library perform the technical analysis calculations.

Example of a chart with Bollinger Bands Source: Trading With Rayner

We’ll be using our returns column to calculate our label for each trading day. If returns are 0 or positive, it will be labelled 1, otherwise it will be labelled 0. Notice that the returns column uses opening prices as opposed to closing prices, to avoid look-ahead bias.

Training the Model

Before we start setting up our model, the data must be normalized. By doing so, all of the features are scaled and given equal importance when the SVM calculates its distances.

We used the MaxAbsScaler, which scales each feature by its maximum absolute value.

Next, we created a dictionary to store the training and testing data. If NaN values aren’t dropped, the model will not run. The variable X will contain all of the features of model, which will then be scaled. It is important to drop the Signal and Returns columns. We are predicting the Signal, if its keep, the model will be almost perfect. If we keep the Returns column, it will also influence the model too much. Remember, the Signal column was initially calculated by using the returns calculated. Y is what we want to predict, so we assign it to the column containing signals.

Our model will use 70% of the data to train, and 30% to test on as shown in line 11.

The model is defined by the model variable (in case you were confused). I left various kernel configurations in the notebook linked below that you can play around with. The model is fit to the training data and used to predict values in the Signal column.

Lastly, we add the accuracy, precision, and recall to our model dictionary for each stock.

Almost there! The next bit of code calculates returns using the signals from the SVM model. By using the iloc method for Pandas data frames, its much easier to append the signals to the end as opposed to how we did it earlier.

We’ll calculate returns relative to how the market performed and use it as our benchmark. Portfolio performance is gauged by use of the Sharpe Ratio.

Finally, graphing the outcomes of the predictions.

A few examples of the output are as follow:

Conclusion

The model isn’t perfect by any means, but it does work for some equities in the Dow Jones Industrial Average. A few ways that this could be improved:

Use the technical indicators to create signals instead of only returns
Adapt the model for long/short scenarios
Use different technical indicators
Create a portfolio including position sizing, transaction costs, slippage etc.

Financial markets are a wonderfully complex place and our model is fairly simple. This example simply classified whether previous returns were indicative of future price direction.

There’s lots of room for improvement, so take a shot at it. A link to the notebook will be in the references, so feel free to play with the code.

I hope this was helpful in your understanding of support vector machines!

References

[1] R. Rosillo, J.Giner, D. De la Fuente and R. Pino ,Trading System Based on Support Vector Machines in the S&P 500 Index (2012), Proceedings of the 2012 International Conference on Artificial Intelligence, ICAI 2012

[2] B. Henrique, V. Sobreiro and H. Kimura, Stock price prediction using support vector regression on daily and up to the minute prices (2018), The Journal of Finance and Data Science

[3] X. Di, Stock Trend Prediction with Technical Indicators using SVM (2014)

[4] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning with Applications in R (2017)

Python Code

EDIT: Fixed issue with using “shift” through out various parts of the code. Thanks to those that called it out, I appreciate the feedback.