Financial Analytics — Exploratory Data Analysis of stock data

Kang Choon Kiat
Towards Data Science
5 min readJul 3, 2019

--

Photo by M. B. M. on Unsplash

With an increase in the penetration of analytics into numerous facets of our lives, Finance is definitely one of the earliest to catch onto this trend. Given the burgeoning market size in fintech and finance, it would be great to impart some financial skills!

Note: This article is meant to teach you some basics of manipulating a stock data set and not make you a quant/stockbroker/algorithmic trader. You would surely need more knowledge in trading ,finance and computing in order to advance in those areas.

1. Importing the data

Most of the financial dataset that you would require can be easily found on Yahoo Finance! You just need to key in the stock data, click on Historical Data

source: https://sg.finance.yahoo.com/quote/FB?p=FB&.tsrc=fin-srch

After that, set the appropriate time period and remember to click Apply! (Heck, when I first started i kept wondering why the csv i downloaded did not give me the right time frame)

source: https://sg.finance.yahoo.com/quote/FB/history?p=FB

For this tutorial, I will be using Facebook’s data from 16 June 18–16 June 2019. Below are the libraries I will be using to manipulate the data

import pandas as pd 
import numpy as np
from scipy.stats import norm

As usual, we should always inspect the data and understand the dataframe before we do any further analysis

df = pd.read_csv('stock data/FB_16June18_16June19.csv')
df.head()

Next we will inspect the datatype of the dataframe using df.info()

Here, we can see that the ‘Date’ column is displayed as an object instead of datetime datatype. This can be problematic when we are plotting histograms/line chart and so we convert it to a datetime object first

df['Date'] = pd.to_datetime(df['Date'])

Now let’s check the datatype again

Nice! We have gotten the ‘Date’ column to be a datetime object now!

2. Plotting

Time to import the magical stuff

import matplotlib.pyplot as plt
%matplotlib inline

matplot library gives us the power to make powerful plots that gives us insights about the stock

plt.figure(figsize=(20,8))
plt.plot('Date','Close',data=df)
plt.xlabel('Date')
plt.ylabel('Close Price')
plt.xticks(rotation=45)

If you have been following financial news, you would know that the huge drop in July is due to Facebook missing their revenue targets and in January it was due to privacy concerns.

3. Creation of columns

Now, let us create some useful columns for us to make some interesting inferences about the stock

First, we will create the column ‘daily lag’ which is basically just shifting the ‘Close’ price by one day back. (Note: there are often various metrics we can use but i am choosing ‘Close’ for convenience)

df['Daily Lag'] = df['Close'].shift(1)
df.head()

The reason for creating the ‘Daily Lag’ is to allow us to create the column ‘Daily Returns’

source: http://www.crsp.com/products/documentation/crsp-calculations

Daily returns tells us the returns that we obtain in a day after the stock price closes (Duh!)

df['Daily Returns'] = (df['Daily Lag']/df['Close']) -1
df.head()

4. More Plotting

Now, let us look at the ‘Daily Returns’

We realise that it is hard to tell the shape if we use the default bin=10. Let us increase it to 20. Before that let us find the mean and standard deviation.

mean = df['Daily Returns'].mean()
std = df['Daily Returns'].std()
print('mean =',mean)
print('Std deviation =',std)

Ouch,generally returns are negative but remember that this only considers the returns on a day-to-day basis and so it just tells you that in general the return if you buy and sell on the same day, you will make a loss.

df['Daily Returns'].hist(bins=20)
plt.axvline(mean,color='red',linestyle='dashed',linewidth=2)
#to plot the std line we plot both the positive and negative values
plt.axvline(std,color='g',linestyle='dashed',linewidth=2)
plt.axvline(-std,color='g',linestyle='dashed',linewidth=2)

Lastly, I will introduce kurtosis value to you.

source: https://community.plm.automation.siemens.com/t5/Testing-Knowledge-Base/Kurtosis/ta-p/412017

Kurtosis tells you the ‘fatness’ of the tail and it is important because it tells you how ‘extreme’ can the values get.

In our case, the value is positive, so this indicates that the chance of ‘extreme’ values are rare. (Note: the right way to actually ascertain this is using Z-value which I will show in another separarte tutorial!)

5. Conclusion

This is a very simple walkthrough on some manipulations of stock data for exploration and to unearth some simple insights! There is definitely more to uncover but I think this is a lot for one tutorial already!

I will be writing more on further statistical analysis and even some trading techniques such as fast-slow signal and Bollinger Bands using python in later tutorials(:

All the code are from my own Jupyter Notebook and I will may upload it soon to my github so keep a lookout!

--

--