Data Analysis & Visualization in Finance — Technical Analysis of Stocks using Python

How to use Python libraries like Pandas, Matplotlib and Seaborn to derive insights from daily price-volume stock market data.

Published in

Towards Data Science

12 min readFeb 29, 2020

With an increase in the penetration of analytics into numerous facets of our lives, finance is definitely one of the earliest to catch onto this trend. In this article, I have attempted to showcase how data analytics and visualization techniques can be incorporated in the world of finance.

1. About data

For this analysis, I have used 2 years of historical data from around mid-Feb 2018 to Feb 2020 of the below stocks listed on National Stock Exchange(NSE)—

HDFC Ltd.
Sun Pharmaceutical Industries Ltd.
Tata Consultancy Services Ltd.
Jindal Steel & Power Ltd.
Jubilant FoodWorks Ltd.

The reason why I selected these is because I had worked on these stocks during the my recent internship. You may choose your own set of stocks and the time period for the analysis.

The stocks I selected are from different sectors and market cap. You’ll see how it is advantageous as we progress in this article. For the start, we’ll investigate the HDFC stock individually and then move on to the combined analysis. For this section, I have downloaded the csv file of historical data of HDFC stock from Yahoo finance . In next section, we’ll be using an awesome tool to directly extract stock prices from the web.

2. Understanding data & general statistics

Import necessary libraries —

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import warnings
warnings.filterwarnings('ignore')

Read data from csv file and display first few rows —

HDFC_df = pd.read_csv(“HDFC.csv”) 
HDFC_df.head()

Output:

Since our time-frame of analysis is large, we can relax on number of decimal places to consider.

HDFC_df = HDFC_df.round(2)
HDFC_df.head(2)

Output:

Better! Now, lets determine the shape of the dataset —

HDFC_df.shape

Output: (491, 7)

Our HDFC dataset has 491 rows and 7 columns. Next, we check if the dataset has any null values —

HDFC_df.isnull().sum()

Output:

Drop the null entries from the dataset —

HDFC_df.dropna(inplace = True, axis = 0)

Now, let’s check the data type of each column —

HDFC_df.dtypes

Output:

As we can see, the ‘Date’ column is not in appropriate format. Pandas has inbuilt features to deal with time-series data in a smarter way. But to make use of Pandas functionality for dates, we need to ensure that the ‘Date’ column is of type ‘datetime64(ns)’.

HDFC_df['Date'] = pd.to_datetime(HDFC_df['Date'])
HDFC_df.head(2

To get total time duration for which we’re carrying out this analysis —

HDFC_df[‘Date’].max() — HDFC_df[‘Date’].min()

Output: Timedelta(‘729 days 00:00:00’)

There are approximately 252 trading days in an year with an average of 21 days per month, or 63 days per quarter. Out of a possible 365 days, 104 days are weekends (Saturday and Sunday) when the stock exchanges are closed.

Next, we’ll use the describe() function of Pandas to get high-level overview of how the HDFC stock performed in about last couple of months —

HDFC_df.iloc[-90:].describe().astype(int)

Output:

In last 90 days, the average closing price for HDFC stock was about ₹2307.
For about 75% of time the stock was trading below ₹2421 and it clocked maximum of ₹2492. The maximum volume of shares traded on a single day was 8808006 with median quantity being 2776142.

3. General variation in the stock price

Before we move on towards further investigation, we’ll set the ‘Date’ column as the index of the dataframe. It makes plotting easy.

HDFC_df.index = HDFC_df[‘Date’]

Now plot the closing price (adjusted) of the stock over the period of 2 years to get a general idea of how the stock performed in the given period.

HDFC_df[‘Adj Close’].plot(figsize = (15,8))
plt.show()

Output:

In the above plot, if you notice, there is a drastic decrease in the price of stock sometime around the month of September 2018. Apart from the “September effect”, the general decline in the stock price of HDFC can be attributed to the escalating tariff war between the US and China that had a ripple effect on Indian financial markets.

4. Day-to-day percentage change(Daily returns)

Daily percentage change in the price of the stock is calculated on the basis of percentage change between 2 consecutive days’ closing prices. Let’s say if the closing price of the stock yesterday was ₹500 and today the stock closed as ₹550. So, the percentage change is 10%. i.e. ((550–500) / 500)*100. No mystery here!

Accordingly, we’ll introduce a new column ‘Day_Perc_Change’ denoting the daily returns in the price of the stock. This can be done using in-built pct_change() function in python.

HDFC_df[‘Day_Perc_Change’] = HDFC_df[‘Adj Close’].pct_change()*100
HDFC_df.head()

Output:

You’ll notice that the first value in the ‘Day_Perc_Change’ column is NaN. We’ll drop this row.

HDFC_df.dropna(axis = 0, inplace = True)

Representing daily returns in form of a plot —

HDFC_df[‘Day_Perc_Change’].plot(figsize = (12, 6), fontsize = 12)

Output:

It can be observed that for most of the days, the returns are between -2% to 2% with few spikes in between crossing 6% mark on both the sides.

News article from BusinessLine dated 24th September 2018

Likewise you can find similar news articles for the days when there was drastic rise/fall in the price of the stock.

Plotting daily returns distribution histogram —

HDFC_df[‘Day_Perc_Change’].hist(bins = 50, figsize = (10,5)) 
plt.xlabel(‘Daily returns’)
plt.ylabel(‘Frequency’)
plt.show()#satistics
HDFC_df.Day_Perc_Change.describe()

Output:

The daily returns histogram is centered about origin. For the past 2 years, the mean daily returns has been about 0.072 and for most of the days the daily return was less than 1% implying that the HDFC stock has been less volatile over the period. During the period, the highest % change in positive direction was observed to be 6.46% and was 6.56% in negative direction. Clearly, we didn’t had any instances of ‘bull run’ or ‘bear drop’!

5. Trend Analysis

Next we add a new column ‘Trend’ whose values are based on the day-to-day percentage change we calculated above. Trend is determined from below relationship —

def trend(x):
  if x > -0.5 and x <= 0.5:
    return ‘Slight or No change’
  elif x > 0.5 and x <= 1:
    return ‘Slight Positive’
  elif x > -1 and x <= -0.5:
    return ‘Slight Negative’
  elif x > 1 and x <= 3:
    return ‘Positive’
  elif x > -3 and x <= -1:
    return ‘Negative’
  elif x > 3 and x <= 7:
    return ‘Among top gainers’
  elif x > -7 and x <= -3:
    return ‘Among top losers’
  elif x > 7:
    return ‘Bull run’
  elif x <= -7:
    return ‘Bear drop’HDFC_df[‘Trend’]= np.zeros(HDFC_df[‘Day_Perc_Change’].count())
HDFC_df[‘Trend’]= HDFC_df[‘Day_Perc_Change’].apply(lambda x:trend(x))HDFC_df.head()

Output:

We wish to see how the stock was trending in past 2 years. This can be visualized as a pie chart, with each sector representing the percentage of days each trend occurred. We’ll plot a pie chart for the ‘Trend’ column to visualize the relative frequency of each trend category.

For this, we’ll use the groupby() function with the trend column to aggregate all days with the same trend into a single group before plotting the pie chart.

Visualizing Trend Frequency with Pie-Chart —

DFC_pie_data = HDFC_df.groupby('Trend')
pie_label = sorted([i for i in HDFC_df.loc[:, 'Trend'].unique()])
plt.pie(HDFC_pie_data['Trend'].count(), labels = pie_label, 
        autopct = '%1.1f%%', radius = 2)

plt.show()

Output:

For the period under consideration from mid-Feb 2018 to Feb 2020, the HDFC stock was among the top gainers for about 1.8% of the time, and among the top losers for 1.6 %. For about 12.4% of the time period, the stock has performed positively on a given day. Likewise, for most period of time (about 29.6%) the stock showed a very slight change in the price. These observations are consistent with the daily return histogram we saw in above section.

6. Daily Returns and Volume

plt.stem(HDFC_df[‘Date’], HDFC_df[‘Day_Perc_Change’])
(HDFC_df[‘Volume’]/1000000).plot(figsize = (15, 7.5), 
                                 color = ‘green’, 
                                 alpha = 0.5)

Output:

(* Daily volume of trade has been reduced in scale to match with the daily return scale)

By juxtaposing the daily trade volume(in green) with the daily returns(in blue), it was observed that whenever the volume of shares traded is high, there is comparatively high rise or fall in the price of the stock leading to the high returns. Thus, on a given day if unconventionally high volume of trading takes place, then one can expect a big change in the market in the either direction. Volume of shares traded when coupled with the rise or fall in Price of stock, in general, is an indicator of the confidence of the traders & investors in a particular company.

7. Correlation Analysis Of Stocks with Pair plot and Joint plots

“Never put all your eggs in a single basket”

Whenever we go for the diversification of the portfolio, we would NOT want the stocks to be related to each other. Mathematically, Pearson’s correlation coefficient (also called Pearson’s R value) between any pair of stocks should be close to 0. The idea behind is simple — suppose your portfolio comprises of the stocks that are highly correlated, then if one stock tumbles, the others might fall too and you’re at the risk of losing all your investment!

I selected the aforementioned stocks to perform the correlation analysis. All these stocks are from different segments of Industry and Market cap. You are free to choose the stocks of your interest. the procedure remains the same.

In previous section we’ve used the pre-downloaded csv file for analysis. In this section, we’ll take the help of Pandas web data reader package to extract the prices of stocks.

# import package
import pandas_datareader.data as web# set start and end dates 
start = datetime.datetime(2018, 2, 15)
end = datetime.datetime(2020, 2, 14) # extract the closing price data
combined_df = web.DataReader([‘HDFC.NS’, ‘JINDALSTEL.NS’,    ‘JUBLFOOD.NS’,‘SUNPHARMA.NS’, ‘TCS.NS’, ‘^NSEI’],
‘yahoo’, start = start, end = end)[‘Adj Close’]

Drop null values and display first few rows —

# drop null values
combined_df.dropna(inplace = True, axis = 0)# display first few rows
combined_df.head()

Output:

(^ NSEI is symbol for National Stock Exchange Index — NIFTY 50 )

Next we’ll analyse the correlation between the different stocks in a pair-wise fashion with Seaborn pairplot.

# store daily returns of all above stocks in a new dataframe 
pct_chg_df = combined_df.pct_change()*100
pct_chg_df.dropna(inplace = True, how = ‘any’, axis = 0)# plotting pairplot  
import seaborn as sns
sns.set(style = ‘ticks’, font_scale = 1.25)
sns.pairplot(pct_chg_df)

Output:

Note that the correlation analysis is performed on the daily percentage change(daily returns) of the stock price and not on the stock price.

If you observe carefully, the plots in the lower triangular area are the same as the plots in the upper triangular area with just axes interchanged. So, analyzing either set of plots would suffice. The diagonal represents the histograms, just like the one seen above for HDFC stock.

Takeaway:

HDFC, Jindal Steel, Jubilant Foods, Sun Pharma and TCS stocks can be included in a portfolio as no two stocks show any significant correlation.

Drawback:

Although the pair plots provide very good visualization of all possible combinations between the bunch of stocks, it doesn’t provide any detailed information like Pearson’s R value or null-hypothesis p value to quantify the correlation. That’s where the joint plot comes into the picture!

While Pair plot provides a visual insight into all possible correlations, Seaborn jointplot provides detailed information like Pearson’s R value (Pearson’s correlation coefficient) for each pair of stocks. Pearson’s R value ranges from -1 to 1. Negative value indicates a negative linear relation between the variables, while positive value indicates a positive relationship. Pearson’s R value closer to 1 (or -1) indicates strong correlation, while value closer to 0 indicates weak correlation.

In addition to Pearson’s R value, joint plot also shows the respective histograms on the edges as well as null hypothesis p-value.

Here’s an example of a joint plots between the stocks of Sun Pharma vs Jindal Steel and Jindal Steel vs HDFC —

from scipy.stats import statssns.jointplot(‘SUNPHARMA.NS’, ‘JINDALSTEL.NS’, pct_chg_df, kind = ‘scatter’).annotate(stats.pearsonr)sns.jointplot(‘JINDALSTEL.NS’, ‘HDFC.NS’, pct_chg_df, kind = ‘scatter’).annotate(stats.pearsonr)plt.show()

Output:

Takeaways:

The Pearson’s R value is 0.24 for Jindal Steel v/s Sun Pharma which is very less. This indicates a weak correlation.
Similarly, the Pearson’s R value is observed to be 0.29 in case of HDFC v/s Jindal Steel indicating weak correlation between HDFC and Jindal Steel stocks.

These above results from Joint plots equip us with the numerical figures to ascertain the insights we derived by visually observing the Pair plot previously.

Beware! having correlation is not the only parameter to determine which stocks to include in an portfolio and which to remove. There are several other factors at play. It’s best to seek advice of the experts and make an informed decision.

9. Volatility analysis

Volatility is one of the most important pillars in financial markets. A stock is said to have high volatility if its value can change dramatically within a short span of time. On other hand, lower volatility means that value of stock tends to be relatively steady over a period of time. These movements are due to several factors including demand and supply, sentiment, corporate actions, greed, and fear, etc. Mathematically, volatility is measured using a statistical measure called ‘standard deviation’, which measures an asset’s departure from its average value.

We have already calculated the intraday returns (daily returns) of the HDFC stock and several other stocks. Next, we will calculate the 7-day rolling mean(also called moving average) of the daily returns, then compute the standard deviation (which is square root of the variance) and plot the values. Relax, we don’t have to calculate all this manually; Pandas ‘rolling()’ function and ‘std()’ function does the job for us in just one line!

HDFC_vol = pct_chg_df[‘HDFC.NS’].rolling(7).std()*np.sqrt(7)
HDFC_vol.plot(figsize = (15, 7))

Output:

Next we’ll see the comparative volatility analysis of HDFC stock with SunPharma stock and NIFTY50 index. Just like above, we compute 7-day rolling mean, and standard deviation, all in a single line of code. Pandas indeed makes our life easy!

volatility = pct_chg_df[['HDFC.NS', 'SUNPHARMA.NS', '^NSEI']].rolling(7).std()*np.sqrt(7)volatility.plot(figsize = (15, 7))

Output:

You can observe that Sun Pharma stock has higher volatility as compared to HDFC stock, while the Nifty index has least volatility. This is expected as Sun Pharma is a mid-cap stock and mid-cap stocks in general tend to have higher volatility as compared to the large-cap stocks such as HDFC.

Many traders and investors seek out higher volatility investments in order to make higher profits. If a stock does not move, not only it has low volatility, but also it has low gain potential. On the other hand, a stock or other security with a very high volatility level can have tremendous profit potential, but the risk is equally high.

Endnotes

There is no full-proof strategy that can guarantee profit on your investment. In the real-world, there are a number of factors that traders take into account before making an investment. By using the right strategies and techniques, we can only improve our chances. I am looking forward to introduce some of these strategies & techniques along with demonstration using python in my subsequent articles.

References:

For Data — Yahoo Finance
For Finance technicalities — Investopedia

— — — — — — — — — — — — — — — — — — — — — — — — —
Disclaimer — This article is purely for educational purpose and does not endorse any company or stock. Kindly consult experts before making any decision related to buying or selling of stocks.
— — — — — — — — — — — — — — — — — — — — — — — — —

Data Analysis & Visualization in Finance — Technical Analysis of Stocks using Python

How to use Python libraries like Pandas, Matplotlib and Seaborn to derive insights from daily price-volume stock market data.

1. About data

2. Understanding data & general statistics

3. General variation in the stock price

4. Day-to-day percentage change(Daily returns)

5. Trend Analysis

6. Daily Returns and Volume

7. Correlation Analysis Of Stocks with Pair plot and Joint plots

9. Volatility analysis

Endnotes

Written by Pratik Nabriya