Mastering Time Series Analysis with Python Classes

Time series analysis is one of the most common Data Science tasks. It involves analyzing trends in data points that are ordered temporally. There are a wide variety of time series data including stock market data, weather data, consumer demand data, and much more. Time series analysis has applications across a wide range of industries which makes it an essential skill for data scientists and data analysts.
Time series analysis involves many techniques that can’t be summarized into a single article. Some of the most common approaches include visualizing time series data through line charts, building time series forecasting models, performing spectral analysis for uncovering cyclic trends, analyzing seasonality trends, and more.
Because time series analysis involves many different techniques it naturally lends itself to object-oriented Programming. Python classes make it easy to organize attributes and methods for related time series tasks. For example, if as a data scientist, you often perform line chart visualizations, seasonality analysis, and time series forecasting, classes can allow you to easily organize methods and attributes for these tasks.
When done well object-oriented programming can improve the readability, reusability, maintainability, and repeatability of time series experiments. Since classes are a set of methods and attributes, it makes it clear which functions are used for a specific purpose. This makes modifying and maintaining existing functions easy. Further, once you have a reliable set of methods defined in your class you can easily rerun experiments with different parameters, refreshed training data, and more with little to no need to rewrite code.
Here we will walk through how to write a class that organizes the steps within a time series analysis workflow. Each part of the workflow will be defined by a class method that completes a single task. We will look at how to define class methods for time series visualization, statistical testing, splitting data for training and testing, training a forecasting model, and validating our time series model.
For this work, I will be writing code in Deepnote, which is a collaborative data science notebook that makes running reproducible experiments very easy.
For our modeling, we will work with the fictitious Weather Climate Time Series data set, which is publicly available on Kaggle. The data set is free to use, modify and share under the Creative Commons Universal Public Domain License (CC0 1.0).
Read in data
To start, let’s navigate to Deepnote and create a new project (you can sign-up for free if you don’t already have an account). Let’s create a project called ‘time_series’ and a notebook within this project called ‘time_series_oop’. Also, let’s drag and drop the DailyDelhiClimate.csv file on the left-hand panel on the page where it says ‘FILES’:

Let’s start by importing the Pandas library:
import pandas as pd
Next, we will define a class that allows us to read our weather data. We will call our python class TimeSeriesAnalysis:
class TimeSeriesAnalysis:
def __init__(self, data):
self.df = pd.read_csv(data)
We can define an instance of our class and access our data frame through our TimeSeriesAnalysis object. Let’s display the first five rows of our data:
We see that we have a date object and four float columns. We have mean temperature, humidity, wind speed, and mean pressure. Let’s add a method to our class that prepares our time series. The method will take a numerical column name and return a time series, where data is the index and the values correspond to the selected column:
class TimeSeriesAnalysis:
...
def get_time_series(self, target):
self.df['date'] = pd.to_datetime(self.df['date'])
self.df.set_index('date', inplace=True)
self.ts_df = self.df[target]
We can now define a new instance of our class and access our time series data:
Generate summary statistics
The next thing we can do is add a method that generates some basic summary statistics for our time series. For example, we can define a class method that returns the mean and standard deviation for a specified column:
class TimeSeriesAnalysis:
...
def get_summary_stats(self):
print(f"Mean {self.target}: ", self.ts_df.mean())
print(f"Standard Dev. {self.target}: ", self.ts_df.std())
We can then define a new class instance, call the get_time_series method on our object instance with the mean temperature columns as input, and generate summary statistics:
We can do the same for humidity, wind speed and mean pressure
Visualize time series
The next thing we can is define a method that allows us to generate some visualizations. For our visualizations we will need to import Seaborn, Matplotlib, and the statsmodels package:
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
Let’s define a new class instance for mean temperature and create a line plot, histogram and perform seasonal decomposition:
class TimeSeriesAnalysis:
...
def visualize(self, line_plot, histogram, decompose):
sns.set()
if line_plot:
plt.plot(self.ts_df)
plt.title(f"Daily {self.target}")
plt.xticks(rotation=45)
plt.show()
if histogram:
self.ts_df.hist(bins=100)
plt.title(f"Histogram of {self.target}")
plt.show()
if decompose:
decomposition = sm.tsa.seasonal_decompose(self.ts_df, model='additive', period =180)
fig = decomposition.plot()
plt.show()
Let’s have our method give the option to generate a histogram of target values, a time series line plot and time series decomposition:
Stationarity Tests
Adjusters Dickey-Fuller
Another method we can add is a test for stationarity using the Dickey-Fuller test. Stationarity is when the mean and variance of a time series don’t change over time. Moreover, if a time series is stationary it doesn’t have any trends. Upon inspecting our plot we can see that the weather data is non-stationary since there are clear seasonal trends. We will use the Dickey-Fuller test to check for stationarity. The Dickey-Fuller Test has the following hypothesis:
Null Hypothesis: The time series is non-stationary.
Alternative Hypothesis: The time series is stationary.
We interpret the results in the following way:
If the test statistic < the critical values we reject the null hypothesis
If the test statistic >the critical values we failed to reject the null hypothesis
The results of the test will have critical values at significance levels of 1%, 5%, and 10%. It will also have the test statistic. Let’s define a method that allows us to run this test:
class TimeSeriesAnalysis:
...
def stationarity_test(self):
adft = adfuller(self.ts_df,autolag="AIC")
output_df = pd.DataFrame({"Values":[adft[0],adft[1],adft[2],adft[3], adft[4]['1%'], adft[4]['5%'], adft[4]['10%']] , "Metric":["Test Statistics","p-value","No. of lags used","Number of observations used",
"critical value (1%)", "critical value (5%)", "critical value (10%)"]})
self.adf_results = output_df
At the top of our notebook, let’s import the adfuller method from the stats models packages:
from statsmodels.tsa.stattools import adfuller
We can then call the stationarity_test method on our climate object and access the test results:
We see that in each case the critical values are less than the test statistic which is to be expected. This means that our data is non-stationary.
Kwiatkowski – Phillips – Schmidt – Shin (KPSS)
Null Hypothesis: The time series is stationary.
Alternative Hypothesis: The time series is not stationary.
Similar to the ADF test, we interpret the results in the following way:
If the test statistic < the critical values we reject the null hypothesis
If the test statistic >the critical values we failed to reject the null hypothesis
Another stationarity test is the Kwiatkowski – Phillips – Schmidt – Shin test (KPSS). The null hypothesis for this test is that the time series is stationary. The difference between ADF and KPSS is that ADF tests for stationarity while KPSS tests for non-stationarity. If the ADF test results in failing to reject the null hypothesis, you should use KPSS to confirm that the time series is non-stationary. Let’s extend our stationarity test method such that it gives options for both ADF and KPSS. Lets import the KPSS method from sstats models:
from statsmodels.tsa.stattools import adfuller, kpss
We can then extend the definition of our stationary test method to include KPSS:
class TimeSeriesAnalysis:
...
def stationarity_test(self):
adft = adfuller(self.ts_df,autolag="AIC")
kpsst = kpss(self.ts_df,autolag="AIC")
adf_results = pd.DataFrame({"Values":[adft[0],adft[1],adft[2],adft[3], adft[4]['1%'], adft[4]['5%'], adft[4]['10%']] , "Metric":["Test Statistics","p-value","No. of lags used","Number of observations used",
"critical value (1%)", "critical value (5%)", "critical value (10%)"]})
kpss_results = pd.DataFrame({"Values":[kpsst[0],kpsst[1],kpsst[2], kpsst[3]['1%'], kpsst[3]['5%'], kpsst[3]['10%']] , "Metric":["Test Statistics","p-value","No. of lags used",
"critical value (1%)", "critical value (5%)", "critical value (10%)"]})
self.adf_results = adf_results
self.kpss_resulta = kpss_results
And we can then call the stationarity_test method on our climate object and access the KPSS test results:
We see that the test statistic is less than the critical values which means we reject the null hypothesis. The null hypothesis states that the time series is stationary. This means we have evidence to support the claim that the time series is non-stationary.
We can define a method that performs ADF and KPSS tests and prints the results:
class TimeSeriesAnalysis:
...
def stationarity_test(self):
adft = adfuller(self.ts_df,autolag="AIC")
kpsst = kpss(self.ts_df)
adf_results = pd.DataFrame({"Values":[adft[0],adft[1],adft[2],adft[3], adft[4]['1%'], adft[4]['5%'], adft[4]['10%']] , "Metric":["Test Statistics","p-value","No. of lags used","Number of observations used",
"critical value (1%)", "critical value (5%)", "critical value (10%)"]})
kpss_results = pd.DataFrame({"Values":[kpsst[0],kpsst[1],kpsst[2], kpsst[3]['1%'], kpsst[3]['5%'], kpsst[3]['10%']] , "Metric":["Test Statistics","p-value","No. of lags used",
"critical value (1%)", "critical value (5%)", "critical value (10%)"]})
self.adf_results = adf_results
self.kpss_results = kpss_results
print(self.adf_results)
print(self.kpss_results)
self.adf_status = adf_results['Values'].iloc[1] > adf_results['Values'].iloc[4]
self.kpss_status = kpss_results['Values'].iloc[1] < kpss_results['Values'].iloc[3]
print("ADF Results: ", self.adf_status)
print("KPSS Results: " ,self.kpss_status)
And we get:
It is best to use both tests to confirm if a time series is stationary. The results of both tests give strong evidence that the time series is non-stationary.
Interestingly, the most common methods for time series forecasting, such as ARIMA & SARIMA can handle non-stationary data. Typically with ARIMA models, it is advised to perform differencing before fitting your model. Auto ARIMA, is a python package that allows you to automatically search for the best differencing parameter. This is convenient since it is not necessary to perform a manual test to find the best order of differencing.
Split for training & testing
When preparing data for training forecasting models it is important to split the data for training and testing. This helps to prevent the model from overfitting the data and consequently poorly generalizing. We will define the training as all data before July 2016 and the testing as all data points after July 2016:
def train_test_split(self):
self.y_train = self.ts_df[self.ts_df.index <= pd.to_datetime('2016-07')]
self.y_test = self.ts_df[self.ts_df.index > pd.to_datetime('2016-07')]
Auto ARIMA model
To use Auto ARIMA, first lets install the pdarima package:
!pip install pmdarima
Next, let’s import autor arima from pdarima:
from pmdarima.arima import auto_arima
We can then define our fit method which we will use to fit our ARIMA model to our training data:
def fit(self):
self.y_train.fillna(0,inplace=True)
model = auto_arima(self.y_train, trace=True, error_action='ignore', suppress_warnings=True, stationary=True)
model.fit(self.y_train)
forecast = model.predict(n_periods=len(self.ts_df))
self.forecast = pd.DataFrame(forecast,index = self.ts_df.index,columns=['Prediction'])
self.ts_df = pd.DataFrame(self.ts_df, index = self.forecast.index)
self.y_train = self.ts_df[self.ts_df.index < pd.to_datetime('2016-07')]
self.y_test = self.ts_df[self.ts_df.index > pd.to_datetime('2016-07')]
self.forecast = self.forecast[self.forecast.index > pd.to_datetime('2016-07')]
We will also define a validate method that displays performance and visualizes model predictions. We will use mean absolute error to evaluate performance:
def validate(self):
plt.plot(self.y_train, label='Train')
plt.plot(self.y_test, label='Test')
plt.plot(self.forecast, label='Prediction')
mae = np.round(mean_absolute_error(self.y_test, self.forecast), 2)
plt.title(f'{self.target} Prediction; MAE: {mae}')
plt.xlabel('Date')
plt.ylabel(f'{self.target}')
plt.xticks(rotation=45)
plt.legend(loc='upper left', fontsize=8)
plt.show()
The full class looks as follows:
Now if we define a new instance, fit and validate our model we get:
The code in this post is available on GitHub.
CONCLUSIONS
In this post, we discuss how to write a class that organizes the steps of the Time Series Analysis workflow. First, we defined a method that allowed us to read and display the data. We then defined a method to calculate the mean and standard deviation for the time series. Next, we defined a method that allowed us to visualize line plots of the time series data, histograms of the time series values, and seasonal decomposition of the data. Next, we wrote a class method that allowed us to perform statistical tests for stationarity. We used the ADF test to test for stationarity and the KPSS test to test for non-stationarity. Finally, we fit an Auto ARIMA model to the training data, validated our model, and generated visualizations for our predictions. I hope you found this useful. I encourage you to try applying these methods to your time series analysis projects!