Adaptive filtering in Stock Market prediction: a different approach

Using LMS linear adaptive filter to predict Stock Market prices

Matheus Farias
Towards Data Science

--

Probably you saw a lot of content in the internet about stock market prediction, but every one of them seems to be a lot of talk with less result, or something that appears to be magic or inaccessible.

Predicting stock market prices are part of a major area in data science, called time series analysis and here we will see how we can deal with this problem in a non-conventional approach that is very beautiful, simple and with a nice result: adaptive filtering.

Time Series

A time series is a series of data points indexed (or listed or graphed) in time order.

Being very important to fields like econometrics, statistics and meteorology, the study of time series is one of the challenging problems that motivates a lot of studies in signal processing, machine learning and data analysis in general, since it can be used for clustering, classification, and in this case, prediction.

Stock prices analysis is a time series problem!

For having a more practical situation, we will use ABEV3 (ON) stock prices during 178 days of 2019. If you think that this kind of data is difficult to have access, you are wrong. I got this data in the b3 website, the official Brazilian stock market, and it’s free :)

ABEV3 (ON) stock prices during 180 days of 2019

Nice, once we have the data in hands, let’s talk about the algorithm

LMS filter

The LMS filter is a kind of adaptive filter that is used for solving linear problems. The idea of the filter is to mimetize a system (finding the filter coefficients) by minimizing the least mean square of the error signal.

LMS filter pseudocode

In general, we don’t know exactly if the problem can be solved very well with linear approach, so we usually test a linear and a non-linear algorithm. Since the internet always shows non-linear approaches, we will use LMS to prove that stock market prediction can be done with linear algorithms with a good precision.

But this filter mimetizes a system, that is, if we apply this filter in our data, we will have the filter coefficients trained, and when we input a new vector, our filter coefficients will output a response that the original system would (in the best case). So we just have to do a tricky modification for using this filter to predict data.

The system

First, we will delay our input vector by l positions, where l would be the quantity of days we want to predict, this l new positions will be filled by zeros.

The system’s schema

When we apply the LMS filter, we will train the filter to the first 178 data. After that, we will set the error as zero, so the system will start to output the answers as the original system to the last l values. We will call the tricky modification as the LMSPred algorithm.

LMSPred pseudocode

Finally, let’s code!

First we have to import the libraries we will use in this code:

import numpy as np
import matplotlib.pyplot as plt

The next step is the implementation of LMSPred, you can try by yourself while looking to the pseudocode, here is my implementation:

def lmsPred(x,l,u,N):
xd= np.block([np.zeros((1,l)), x]).T
y=np.zeros((len(xd),1))
xn=np.zeros((N+1,1))
xn = np.matrix(xn)
wn=np.random.rand(N+1,1)/10
M=len(xd)
for n in range(0,M):
xn = np.block([[xd[n]], [xn[0:N]]]);
y[n]= np.matmul(wn.T, xn);
if(n>M-l-1):
e =0;
else:
e=int(x[n]-y[n]);
wn = wn + 2*u*e*xn;

return y,wn;

Now, we will define our vector x, that has the 178 values of ABEV3 (ON):

x = np.array([1655, 1648, 1615, 1638, 1685, 1729, 1754, 1770, 1780, 1785, 1800, 1800, 1754, 1718, 1716, 1795, 1787, 1797, 1751, 1811, 1845, 1864, 1809, 1875, 1822, 1871, 1867, 1839, 1859, 1849, 1819, 1832, 1815, 1832, 1832, 1839, 1849, 1836, 1723, 1683, 1637, 1669, 1659, 1711, 1700, 1690, 1666, 1676, 1731, 1719, 1700, 1698, 1672, 1652, 1699, 1654, 1675, 1683, 1682, 1677, 1684, 1732, 1744, 1735, 1769, 1755, 1725, 1706, 1742, 1753, 1705, 1708, 1750, 1767, 1772, 1831, 1829, 1835, 1847, 1795, 1792, 1806, 1765, 1792, 1749, 1730, 1701, 1694, 1661, 1664, 1649, 1649, 1709, 1721, 1721, 1706, 1722, 1731, 1726, 1743, 1755, 1742, 1735, 1741, 1764, 1761, 1765, 1772, 1768, 1785, 1764, 1780, 1805, 1820, 1845, 1830, 1817, 1810, 1805, 1789, 1781, 1813, 1887, 1900, 1900, 1894, 1902, 1869, 1820, 1825, 1810, 1799, 1825, 1809, 1799, 1803, 1796, 1949, 1980, 2050, 2034, 2013, 2042, 2049, 2016, 2048, 2063, 2017, 2007, 1948, 1938, 1901, 1878, 1890, 1911, 1894, 1880, 1847, 1833, 1809, 1817, 1815, 1855, 1872, 1838, 1852, 1880, 1869, 1872, 1887, 1882, 1891, 1937, 1910, 1915, 1943, 1926, 1935]);

For training the system, we will take the first 173 values, with a learning rate of 2^(-30), filter order N=60 and l=5 days of prediction.

x_train = x[0:173]
u = 2**(-30);
l=5;
N=60;
y,wn = lmsPred(x_train,l,u,N)

To visualize the input data and the learning curve, we will plot as follows:

plt.plot(x, color = 'black')
plt.plot(y, color = 'red')
plt.show()

And to evaluate the percentual error of our prediction:

pred = y[-l:]
realvalues = x[-l]
error = 100*(pred.T-realvalues)/realvalues
print(abs(error))

So, the full code:

import numpy as np
import matplotlib.pyplot as plt
def lmsPred(x,l,u,N):
xd= np.block([np.zeros((1,l)), x]).T
y=np.zeros((len(xd),1))
xn=np.zeros((N+1,1))
xn = np.matrix(xn)
wn=np.random.rand(N+1,1)/10
M=len(xd)
for n in range(0,M):
xn = np.block([[xd[n]], [xn[0:N]]]);
y[n]= np.matmul(wn.T, xn);
if(n>M-l-1):
e =0;
else:
e=int(x[n]-y[n]);
wn = wn + 2*u*e*xn;

return y,wn;
x = np.array([1655, 1648, 1615, 1638, 1685, 1729, 1754, 1770, 1780, 1785, 1800, 1800, 1754, 1718, 1716, 1795, 1787, 1797, 1751, 1811, 1845, 1864, 1809, 1875, 1822, 1871, 1867, 1839, 1859, 1849, 1819, 1832, 1815, 1832, 1832, 1839, 1849, 1836, 1723, 1683, 1637, 1669, 1659, 1711, 1700, 1690, 1666, 1676, 1731, 1719, 1700, 1698, 1672, 1652, 1699, 1654, 1675, 1683, 1682, 1677, 1684, 1732, 1744, 1735, 1769, 1755, 1725, 1706, 1742, 1753, 1705, 1708, 1750, 1767, 1772, 1831, 1829, 1835, 1847, 1795, 1792, 1806, 1765, 1792, 1749, 1730, 1701, 1694, 1661, 1664, 1649, 1649, 1709, 1721, 1721, 1706, 1722, 1731, 1726, 1743, 1755, 1742, 1735, 1741, 1764, 1761, 1765, 1772, 1768, 1785, 1764, 1780, 1805, 1820, 1845, 1830, 1817, 1810, 1805, 1789, 1781, 1813, 1887, 1900, 1900, 1894, 1902, 1869, 1820, 1825, 1810, 1799, 1825, 1809, 1799, 1803, 1796, 1949, 1980, 2050, 2034, 2013, 2042, 2049, 2016, 2048, 2063, 2017, 2007, 1948, 1938, 1901, 1878, 1890, 1911, 1894, 1880, 1847, 1833, 1809, 1817, 1815, 1855, 1872, 1838, 1852, 1880, 1869, 1872, 1887, 1882, 1891, 1937, 1910, 1915, 1943, 1926, 1935]);x_train = x[0:173]
u = 2**(-30);
l=5;
N=60;
y,wn = lmsPred(x_train,l,u,N)
plt.plot(x, color = 'black')
plt.plot(y, color = 'red')
plt.show()
pred = y[-l:]
realvalues = x[-l]
error = 100*(pred.T-realvalues)/realvalues
print(abs(error))

Results

One example of stock market prediction result

We chose the black as the real data, and the red as our prediction, as you can see, initially they have a lot of difference, but closer to the value corresponding to the filter order (in this case, 60) the two curves are very close.

And, for this case, the percentual output accuracy per day is:

[[0.79837693 1.12168626 1.24557245 2.24050302 3.16604697]]

So, the 5th day has 3.16% of error, which is a pretty nice value as we are using a very simple method.

It is important to highlight that stock market prediction is not so good for high values of l, since we want to analyse the stock market during a steady state regime, that is, without considering possible future crysis, politics problems and etc. Due to this, it’s safer to use stock market prediction for small values of l.

If you have any questions you can follow me in my LinkedIn profile!

--

--