The world’s leading publication for data science, AI, and ML professionals.

Predicting the Stock Market is Hard: Creating a Machine-Learning Model (Probably) Won’t Help

There are many articles and how-to's on making machine-learning models to predict stock prices. I'm not here to declare that they're are…

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

Note from Towards Data Science‘s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

There are many articles and how-to’s on making machine-learning models to predict stock prices. I’m not here to declare that they’re wrong or misguided. In fact, I think it’s wonderful that so many people have taken an interest in trying to solve an age-old problem, and that the contributions made to programming languages such as Python, where one can now spin-up a relatively sophisticated machine learning model in minutes, have helped "democratize" investing to a certain degree. Instead, what this article intends to show is the evidence supporting the idea that stock prices likely cannot be predicted. Specifically, the article will use Python to highlight the idea that stock prices follow a random-walk, more or less. The main thrust of the article is that random walk does not preclude the possibility of beating the market, nor advocates tossing investment models aside, but it should signal that these models should be built with extreme care and diligence, and be constantly re-evaluated. The process of building a trading algorithm is a life-long process in many cases.

An Overview of the Random Walk Theory

The Random Walk Theory of stock prices has had a long history, being first introduced in the mid-19th century, and then popularized in the mid-20th century by figures such as Eugene Fama and Burton Malkiel, the latter of which made the random-walk theory common-place in investing circles with his classic, a Random Walk Down Wall Street. In its simplest form, the theory states that stock prices cannot be predicted because changes in stock prices are random. The theory presupposes that financial markets are efficient. That is, all publicly available information is incorporated in the stock’s current price due to market participants being rational profit-maximizers. If any anomaly existed, it would quickly be exploited and removed, leading to a more efficient state. For example, suppose the market thought that Apple’s stock was undervalued. The random-walk theory assumes that market participants would act immediately by buying the stock, which in turn would cause it not to be undervalued anymore. This "perfect" efficiency makes it so that stock prices are priced accordingly and reflect all available information. New news is the only thing that can change the price of a stock, and since the news-cycle is unpredictable, stock prices, therefore, move randomly.

As you’ve probably noticed, the Random-Walk Hypothesis (RWH) makes several assumptions, chief among them being that financial markets are efficient. The very fact that bubbles exists, however, seems to throw that assumption, and therefore the Random Walk theory into question. Furthermore, the theory gave rise to the field of behavioral Finance, with Richard Thaler being one of its chief proponents, which aimed to show that investors, in many situations, are far from rational actors, which in turn implies that markets can hardly be efficient when its actors often act irrationally.

Despite these (legitimate) objections, the Random Walk Hypothesis is supported by a considerable amount of empirical evidence, and the remainder of this article will highlight some of it with the help of Python. The code can be found here

Demonstrating Random Walk with Python

Because stock prices follow a random walk, according to RWH, a stock’s price today is the best predictor for its price tomorrow. To test this claim, we can compare lagged prices of a particular stock during a various intervals, to the stock’s most recent price in order to determine whether they are indicative of today’s price.

DOCU with 1,2,3,4 and 5 day lags
DOCU with 1,2,3,4 and 5 day lags

Again, the idea here is to assess whether today’s price is the best indicator of tomorrow’s price. If that’s the case, then stocks are priced accordingly, and would therefore move randomly. On the other hand, if the lagged prices bear little to no relationship with today’s price, then today’s price is not the best predictor of a stock’s price – perhaps past prices are – and therefore the market is not efficient. Using linear regression to assess this, we can see that lagged prices have an extremely strong relationship with today’s price. In this particular example, DocuSign’s lagged prices are indicative of today’s price.

Expanding this analysis to every stock in the Nasdaq 100, this phenomenon turns out to be widespread. Each linear regression model for every stock in the Nasdaq 100 has an r-squared score of 0.99, indicating that model explains nearly all the variation in the data. That is, the series of lagged prices, particularly lagged_1, fully explain the most recent price:

Does This Mean Investors Can’t Pick Stocks

One implication from the RWH is that traditional methods employed to pick stocks, such as technical and fundamental analysis, are of little use. Both techniques imply that investors can leverage these methods to develop profitable trading strategies, but the RWH thinks this is typically self-defeating because traders will exploit and therefore neutralize these anomalies, making the market efficient. Yet the hypothesis allows for some instances of stock picking where "the analyst will do better than the investor who follows a simple buy-and-hold policy as long as he can more quickly identify situations where there are non-negligible discrepancies between actual prices and intrinsic values than other analysts and investors, and if he is better able to predict the occurrence of important events and evaluate their effects on intrinsic values."

Conclusion

Yes, the Random Walk Theory enjoys empirical support and sets the bar very high for algorithms to consistently divine the path of stock prices. But this does not automatically mean that ambitions of building investing models be thrown away. Even the RWH allows for some traders to outer-perform. Jim Simmons is a perfect example of overcoming the odds. He spent years trying to "solve the market," with one approach after another sputtering, before finally breaking through, after years and countless hours of refining his approach. And even now, his firm is only right half of the time with their trades. The point is not to discourage you from building your next Machine Learning model, but to realistically set expectations and to encourage you to be in it for the long haul.

Further Reading

Random Walks in Stock-Market Prices

A Random Walk Down Wall Street

The Man Who Solved the Market


Related Articles