Tableau your Time Series Forecast with TabPy!

Complement your predictions with Tableau’s speed to insight!

Jerry Paul
Towards Data Science

--

Photo by Julian Hochgesang on Unsplash

Welcome to a quick and short (hopefully) illustration of how one can integrate data science models with Tableau using TabPy. We will specifically look at time series forecasting in this story.

We will use three time series models which are built in python using the superstore dataset ( retail industry data ). We will use Jupyter notebook to build our python codes and then move on to Tableau.

This article is aimed at demonstrating how a model can be integrated with Tableau’s analytics extensions and make it seamless for consumption.

Why Tableau? Well, I love it and I can’t stress enough on how easy it is to explore your data.

Let’s start by looking at the data :

We simply keep our date and sales columns for building a time series object. The below code sorts the sales figures in ascending order and aggregates the data at a month level.

#fetching required columns
data = superstore[[‘Order Date’,’Sales’]]
data = data.sort_values(by = 'Order Date')
#creating a ts object
data['Order Date'] = pd.to_datetime(data['Order Date'])
data.index = data['Order Date']
data = data.resample('M').sum()

We are ready to visualize the time series:

import matplotlib.pyplot as plt
import seaborn as sns
plt.subplots(figsize = (17,7))
sns.lineplot(x = “Order Date”, y = “Sales”, data = data)
plt.show()

The above is our time series plot. There are three important components to time series : Trend, Seasonality and Error. We can look at the series as an ‘additive model’ or a ‘multiplicative model’ depending on the nature of the series and the hypothesis we postulate. I will be parking the model building and forecasting methods in the next article as a continuation of this one where we will cover different techniques, decomposition, stationarity testing, auto-correlation and partial auto-correlation factors and model summary.

For now, I will share the code I had written to finalize the model before I switch gears to Tableau.

As mentioned in the beginning of the article, we will use three models. These are Holt’s Linear Model, Holt-Winter’s Model and ARIMA. The first two are exponential smoothing methods and ARIMA stands for Auto Regressive Integrated Moving Average which is a regression method.

Below is the python code for Holt’s Linear Method :

The model is trained on 42 months and the last 6 months are used for predictions. Model parameters can be tuned for accuracy.The model appends both and gives the entire series back to us.

How do we tie this with Tableau?

Tableau has inbuilt analytics extensions that allow integration with other platforms.

In our case we choose TabPy.

Make sure to install TabPy and start the same in your terminal as laid out in the below resource :

You can test the connection in Tableau in the pop-up described above.

We also import TabPyClient in our python environment to create a connection object.

We will be using this connection object to deploy our models on the TabPy Server that we initiated.

Let’s look at the modified code for Holt’s Linear method that can be deployed on TabPy.

Holt’s Linear Method

We have created a function that returns the model output. Since we will be reading data from Tableau, we have used parameters that take in values we shall pass from Tableau. You will note that we have used the connection object to deploy the model in TabPy. Similarly, you can create functions for other models.

Holt-Winter’s Method

ARIMA

Now that we have deployed these models in TabPy, let’s consume it in Tableau. We will create a calculated field that looks like below :

Tableau uses four functions, namely SCRIPT_REAL, SCRIPT_STR, SCRIPT_BOOL and SCRIPT_INT for return types real, string, boolean and integer respectively. The above code tells Tableau to run the ‘Seasonal ARIMA Method’ which is deployed on TabPy with 3 parameters (Date, Sales and Months to Forecast) and return the ‘response’ to Tableau’s calculated field.

Similarly, we define calculated fields for the other 2 models. If we want to see all at a glance in Tableau, it will look like this :

Note that you can dynamically change the forecast period as you want and see the predictions. You want to choose the model that gives you the best accuracy. You can optionally create a parameter in Tableau to toggle among models.

A key point to note is that we need to accommodate the forecast period (in months in our case) in Tableau so that we make space for the returned values from TabPy.This is because the original dataset does not have these null records for the future dates when we pass values from Tableau. I have tweaked the data to extend the date range as shown below :

The above code essentially extends the date range after adding the required months to forecast and passes it to TabPy. Also, we choose ‘Show Missing Values’ for this calculated date field.

There is a small trade off here.Since we extend the date range, the last date and sales figures get pushed to the new forecast end date. However, we are only interested in the forecast; we can exclude this datapoint or use LAST()=FALSE in the filter box. Feel free to come up with ideas for the same.

Let’s finally plug this into our dashboard :

There you go! We have a well integrated forecasting model sitting inside Tableau’s visual discovery. You can definitely bring in accuracy scores and model parameters to Tableau and make this jazzier!

As mentioned, more to come in my next story.

--

--

Works at Redington Value, Data Science and Analytics. A music lover and a millenial twin.