Quickly create and deploy simulation tools and dashboards, by connecting directly Tableau to your Python Jupyter notebooks, and spend your time on the data, not on building visualizations

Overview
This article shows how to let Tableau, a leading interactive data visualization tool, access dynamic data instead of static data. This opens up new applications, such as dashboards that can display real-time data, or the ability of users to display dynamic predictions obtained by machine learning models.
We start with a gentle introduction to dynamic data access in Tableau, both through Tableau’s native language and through Python (via Analytics Extensions). Since this is a hands-on article, we describe how to install the necessary components (Tableau, and Jupyter notebooks) and then illustrate how to configure them and quickly obtain a Tableau dashboard that displays dynamic data based on machine learning.
Tableau
Tableau is a Business Intelligence tool widely used by the industry (along with PowerBI and QlikView). It allows non-technical people to create and deploy web application with a drag and drop interface.
The only drawback is that by default, the data need to be readily available in a supported database, which cannot itself be updated through the interface. You can however create blazingly fast interactive dashboards that connect to numerous data sources, and quickly get an overview of your data at scale.
Dynamic data in Tableau
Tableau’s scripting language
Tableau offers an Excel-like language that allows users to create custom calculations and manipulate different types of data (numeric, string, boolean). However the limitations of this scripting language can be very frustrating as there is no data manipulation like pivot tables, and even some simple transformations (like splitting a field into multiple columns) often require numerous steps.
Tableau’s scripting language is mainly there to transform values at the cell level, or to operate on the full data set for filtering or computing an aggregation. That said, your options are rather limited and you will have to mainly prepare your data outside of Tableau and create a precomputed dataset will all the data that you want to display.
But what if you want to create a simulation or apply some machine learning techniques and visualize the resulting changes after updating some parameters? The way Tableau works with data source simply does not allow this kind of operation, or that would force you to code your algorithm without any library, which is thoroughly impractical.
Analytics Extension to the rescue
To achieve real-time analytics with Tableau, or at least on-the-fly computations using custom algorithms, a very interesting feature of Tableau is hidden in Help → Settings and Performance → Manage Analytics Extension Connection…. It allows you to achieve what we would expect from a web application.

Here is where Analytics Extension Connections fit in the Tableau data ecosystem:

Python in the Tableau ecosystem
The two places in Tableau where we can expect Python code are in data source creation, and on-the-fly computations (based on the displayed data).
Data source creation
The data source is often an existing database. It is possible to write a custom data injector that writes to this database. This approach should be fast, but it obviously requires a full-blown database, and also a scheduler that regularly invokes the data injector program.
Another approach consists in connecting directly to a custom data source. Tableau thus allows us to create custom Web Data Connector (WDC) and use the WDC API. A fair number of connectors are available, some of which may fit your need. If this is not the case you will have to create a custom connector, but be warned: the learning curve is steep.
Another alternative is that illustrated in this article: by using Jupytab, you can directly export your Pandas Dataframes to Tableau from a Jupyter Notebook. (This approach uses the WDC API under the hood, so that you don’t have to wrangle with it.) By using Jupytab, you also get scheduled data refreshes for free, as you can use the Tableau internal scheduler to poll data on demand. A minor limitation of this approach is the maximum size of data sources: data sets with a billion rows typically do not fit easily in a Pandas Dataframe!
On-the-fly computations
Besides Jupytab, not many other Tableau tools perform on-the-fly data source creation and computation. We must however mention TabPy, which is Tableau’s own tool for interactive analytics in Python. It creates a server that allows you to use Python code in Tableau, by executing code on demand.
Jupytab serves exactly the same purpose, but with a few key differences:
- All your Python code comes from a single notebook: you create the data source and expose it for Tableau there.
- You don’t write Python code in Tableau, you always call a Python function from your notebook.
- The set of available libraries is linked to the notebook. This means that you can create multiple execution environments with different sets of libraries and use all of them in Tableau, possibly from multiple notebooks.
If your main development tool is a Jupyter notebook with Pandas and some data science libraries, and you want to quickly create interactive dashboards ready for production, Jupytab is a tool that is worth a try.
Jupytab installation
We assume that you are familiar with Python, Conda (or virtualenv
), Jupyter and use Linux/macOS for your development. Jupytab works on Windows but is not as well tested—using WSL (Windows Subsystem for Linux) works with this article, though. We will be using conda
for the installation, but it should work with pip
as well.
Jupytab has two components, jupytab
and jupytab-server
, that need to be installed in their own environment:
- jupytab: a very simple API that exposes dataframes and custom functions from a notebook. It needs be installed in the notebook environment, and only requires Pandas as a dependency.
- jupytab-server: provides the Web Data Connector, spawns kernels, manages the configuration and so on… Do not install this component in the same environment as your notebook, so as to prevent conflicts.
On the notebook side
We first create a virtual environment using Python 3.7 to keep everything clean and reproducible (you can also skip this part if you are ready to follow along in a Conda environment that you already have):
(base) % conda create -n jupytab-notebook-env python=3.7
(base) % conda activate jupytab-notebook-env
Then we install the latest jupytab
version and the ipykernel
library so as to make our Jupyter kernel available in notebooks:
(jupytab-notebook-env) % conda install jupytab=0.9.11
Then we install the kernel in Jupyter:
(jupytab-notebook-env) % conda install ipykernel
(jupytab-notebook-env) % python -m ipykernel install --user --name jupytab-simulation-demo
You can now create, in Jupyter, a notebook that will use the newly installed kernel.

It is best to check that Jupytab has been properly installed by checking its version, for example by doing:
import jupytab
print(jupytab.__version__)
[Output]: 0.9.11
On the server side
We also create a new virtual environment based on Python 3.7. You can open a new terminal or just deactivate your previous environment using the command conda deactivate
:
(base) % conda create -n jupytab-server-env python=3.7
(base) % conda activate jupytab-server-env
Then we install the latest jupytab-server version and… that’s all!
(jupytab-server-env) % conda install jupytab-server=0.9.11
Connect Tableau to the notebook
Write the notebook
The notebook and dashboard that inspired this article are freely available: don’t hesitate to download them and use them to follow along.
For our simulation we will only use Pandas and scikit-learn but you may of course install your own libraries. We start by installing scikit-learn in the jupytab-notebook-env (from the base environment, so either in a new terminal or after conda deactivate
):
(base) % conda activate jupytab-notebook-env
# No need to install pandas, it is already installed with jupytab:
(jupytab-notebook-env) % conda install scikit-learn
We create a new notebook named jupytab-simulation-notebook

We will use the well known iris data set, which is included in scikit-learn, and load everything in Pandas dataframes:
import pandas as pd
import jupytab
from sklearn.datasets import load_iris
iris = load_iris()
iris_data_df = pd.DataFrame(columns=iris.feature_names, data=iris.data)
iris_target_df = pd.DataFrame(columns=['target'], data=iris.target)
iris_target_class_df = pd.DataFrame(columns=['target_name'], data=iris.target_names)
iris_data_df.sample(2)
[Output]:

Then we load the dataframe in a Jupytab Tables
dictionary, to indicate that we want to expose those data to Tableau:
tables = jupytab.Tables()
tables['iris'] = jupytab.DataFrameTable("Iris DataSet", iris_data_df, include_index=True)
tables['iris_target'] = jupytab.DataFrameTable("Iris Classification Target", iris_target_df, include_index=True)
tables['iris_target_class'] = jupytab.DataFrameTable("Iris Classes", iris_target_class_df, include_index=True)
A DataFrameTable
defines how to expose data to Tableau:

Here, we simply exposed static DataFrames along with their index.
In order to allow jupytab-server
to retrieve data, we need to add two cells.
The first cell generates a schema that declares all our Dataframes. It needs to be exactly like the cell below (you can copy and paste directly in your notebook):
# GET /schema
tables.render_schema()
Executing this cell prints the schema exported to Tableau.
_[Output]: [{"id": "iris", "alias": "Iris DataSet", "columns": [{"id": "index", "dataType": "int"}, {"id": "sepal_lengthcm", "dataType": "float"}, {"id": "sepal_widthcm", "dataType": "float"}, {"id": "petal_lengthcm", "dataType": "float"}, {"id": "petal_widthcm", "dataType": "float"}]}, {"id": "iris_target", "alias": "Iris Classification Target", "columns": [{"id": "index", "dataType": "int"}, {"id": "target", "dataType": "int"}]}, {"id": "iris_target_class", "alias": "Iris Classes", "columns": [{"id": "index", "dataType": "int"}, {"id": "targetname", "dataType": "string"}]}]
The second cell is where data is exported:
# GET /data
tables.render_data(REQUEST)
Executing this cell generates a harmless error in the notebook: the REQUEST
variable is only available when the notebook is executed by the Jupytab server:
[Output]: NameError Traceback (most recent call last)
in 1 # GET /data – → 2 tables.render_data(REQUEST)
If you don’t want to get the error during execution, you can wrap render_data()
with a simple try
block:
# GET /data
try:
tables.render_data(REQUEST)
except NameError:
print("Not available outside jupytab context")
That’s it! You may also find several Jupytab notebook examples in the Jupytab GitHub repository under the jupitab-server/samples
directory.
Configure and start the Jupytab server
In the Jupytab server environment, we need to create a configuration file that will allow us to configure a few parameters like the server port, a secret token and of course the list of notebooks that the server must exposes.
The config file may be created using a simple cat
command (under Unix), or by writing a config.ini
file using any text editor:
(base) % conda activate jupytab-server-env
(jupytab-server-env) % cat << EOF > config.ini
[main]
listen_port = 8123
notebooks = JupytabSimulator
[JupytabSimulator]
path = /path/to/your/notebook/jupytab-simulation-notebook.ipynb
EOF
More configuration options are also available.
When you are done, you only need to start the Jupytab server:
(jupytab-server-env) % jupytab --config=/path/to/your/config.ini
Starting Jupytab-Server 0.9.11
SSL not enabled
Start notebook /path/to/your/notebook/jupytab-simulation-notebook.ipynb on 127.0.0.1:35453
You have no defined token. Please note your process is not secured !
Please open : http://computer_name:8123
INFO:[KernelGatewayApp] Kernel started: a6abe896-2cb8-403a-8661-3236e16d8def
INFO:[KernelGatewayApp] Registering resource: /schema, methods: (['GET'])
INFO:[KernelGatewayApp] Registering resource: /data, methods: (['GET'])
INFO:[KernelGatewayApp] Registering resource: /_api/spec/swagger.json, methods: (GET)
INFO:[KernelGatewayApp] Jupyter Kernel Gateway at http://127.0.0.1:35453
This will run the notebook and create a server allowing you to access it.
Please note that you need to wait for the registration of your notebook resources to appear in the log, otherwise you will not be able to access it from Tableau. This is the part below:
INFO:[KernelGatewayApp] Registering resource: /schema, methods: (['GET'])
INFO:[KernelGatewayApp] Registering resource: /data, methods: (['GET'])
If your notebook takes time to compute, this may take a few minutes to appear.
Import the data in Tableau
If you don’t have Tableau installed already, you can download it and try it for free. This article was tested with Tableau 2020.2, so you might have to adapt the instructions slightly as needed.
Select the Web Data Connector from the Start screen:

You can then input the web data connector URL that was printed in the terminal:

Start notebook /path/to/your/notebook/jupytab-simulation-notebook.ipynb on 127.0.0.1:35453
You have no defined token. Please note your process is not secured !
Please open : http://computer_name:8123
You should be able to select your first notebook and view all available actions:

Proceed to Explore in Tableau and create your data set using the provided tables. Pay attention to the order and join between the tables:

We now have a data set to work with, imported from a Jupyter notebook.
Create the Iris simulator
We will create a Tableau simulator based on the Iris data set. It is often used as a common machine learning example. The task at hand consists in classifying 3 types of iris according to only four dimensions (sepal and petal length/width).
The goal of this simulator is only to get a glimpse of the creation of dashboards using Tableau. They can be created without writing a single line of code and, more importantly, in only a few minutes.
Update the notebook to add a prediction function
We will use a Multi-Layer Perceptron (MLP) classifier and train it with the full content of the data set. Our goal is not to achieve any kind of high-accuracy predictions, but instead to showcase how you can dynamically make predictions from within Tableau, thanks to Jupytab.
As we did with jupytab.Tables
, we need to add a jupytab.Functions
dictionary for registering our predictor method:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(max_iter=600).fit(iris.data, iris.target)
def predictor(sepal_length_cm, sepal_width_cm, petal_length_cm, petal_width_cm):
class_predict = clf.predict([[sepal_length_cm, sepal_width_cm, petal_length_cm, petal_width_cm]])
return iris.target_names[class_predict][0] # We return the only prediction
functions = jupytab.Functions()
functions['predict'] = jupytab.Function('A predictor for the Iris DataSet', predictor)
predictor(0.5, 5, 4, 2)
[Output]: ‘virginica’
We also need to define a new endpoint that will be used by the Jupytab server:
# POST /evaluate
try:
functions.render_evaluate(REQUEST)
except NameError:
print("Not available outside jupytab context")
[Output]: Not available outside jupytab context
We must then save the notebook and restart the kernel from Tableau so that our changes are taken into account by Tableau.

Predict dynamically in Tableau
Connect to Jupytab through an Analytics Extension
Before using the predictor function in Tableau, we need to register our Jupytab server as an analytics extension in Tableau.
This is easily done from the Help → Settings and Performance → Manage Analytics Extension Connection item from the Tableau Desktop menu.

Check that everything works fine by using the Test Connection button:

Dashboard creation
To display a data point for each specimen, we need to transform our Iris DataSet Index to a dimension:

Then drop this value as Detail:

We get the following result:

We will then display a grid to visualize all dimensions against each others. Select all measures from the Iris DataSet:

And drop in Columns:

Do the same with Rows:

Your visualization is almost ready!

Just drop the Target Name from Iris Classes in the Color marks.

The output allows you to visualize the result as a grid graph comparing all dimensions:

Now we would like to check that our model correctly predicts the Iris type. We want to call our predictor for each Iris sample and display misclassified data in the graph above. We need to create a Calculated Field to achieve that:

The following formula returns the predicted Iris type:
SCRIPT_STR("JupytabSimulator.predict", AVG([Sepal Length Cm]), AVG([Sepal Width Cm]), AVG([Petal Length Cm]), AVG([Petal Width Cm]))

JupytabSimulator
is the name of our registered notebook in our config.ini
file, and predict
the name of our registered function. The join of the two values is the target. This means that you can target different notebooks, each one using its own virtual environment and set of libraries.
Function parameters are the four parameters that you defined in your predictor function, and of course they need to be inserted in the same order.
We also need another function that will compare the real Iris type with the predictor and return False
if there is a mismatch. The ATTR
function is required, as Iris Predictor is already an aggregated data:
ATTR([Target Name]) = [Iris Predictor]

We will then replace the current Clusters color using Prediction Matching and choose Circle as the representation.

Tableau will communicate with Jupytab for a few minutes, as it needs to predict the Iris class for all the displayed marks (i.e., data points):

This is an important point of the Tableau Analytics extension: you need to minimize the number of marks displayed in your Dashboard to guarantee quick results and a more interactive experience.
Incorrect predictions can then be displayed thanks to the Prediction Matching calculation above:

By comparing this graph with the previous one, we see how mis-predicted iris types tend to happen at the frontier between iris groups.
A visualization that only gives us the number of mismatch, would drastically the number of marks, while still providing some important informations. We only need to remove all measures from Columns & Rows to replace them by our Prediction Matching calculation. This new visualization will load in a few seconds:

Last but not least, we would like to understand how the number of iterations of our MLP training impacts our results. This will require us to use another cool feature of Tableau: Parameters.
We start by updating our predictor function in the notebook, adding a new optional variable that will train (again) our MLP. We only need to cache the result to avoid training with the same number of iterations at each invocation:
from sklearn.neural_network import MLPClassifier
last_max_iter = 0
clf = None
def train(request_max_iter):
global clf, last_max_iter
if not clf or last_max_iter != request_max_iter:
clf = MLPClassifier(max_iter=request_max_iter, random_state=1).fit(iris.data, iris.target)
last_max_iter = request_max_iter
return clf
def predictor(sepal_length_cm, sepal_width_cm, petal_length_cm, petal_width_cm, request_max_iter=1000):
clf = train(request_max_iter) # We now train as needed
class_predict = clf.predict([[sepal_length_cm, sepal_width_cm, petal_length_cm, petal_width_cm]])
return iris.target_names[class_predict][0]
functions = jupytab.Functions()
functions['predict'] = jupytab.Function('A predictor for the Iris DataSet', predictor)
predictor(0.5, 5, 4, 2)
[Output]: ‘virginica’
Do not forget to restart the kernel in Tableau through the Jupytab UI so as to apply the changes.
On the Tableau side, we will create a parameter with a range from 10 to 1000 in order to visualize the impact of the number of training iterations on our predictions:


We then add this parameter to our view:

We can now dynamically change the maximum number of training iteration of our MLP:

We also need to update the Iris Predictor calculation and add this new parameter:
SCRIPT_STR("JupytabSimulator.predict", AVG([Sepal Length Cm]), AVG([Sepal Width Cm]), AVG([Petal Length Cm]), AVG([Petal Width Cm]), [MLP Max Iteration])

Just after clicking on apply, the result is immediately updated with our default value of 10 iterations.
MLP Training 10 Iterations

MLP Training 100 Iterations

Conclusion
Tableau is a very powerful tool for data visualization. Using Jupyter notebooks to compute data and get an almost immediate feedback is very convenient.
We have barely scratched the surface of Tableau features, and there is almost no limit to the kind of visualisations and interactions that you may create in your dashboard.
The Jupytab project is open source and accessible on GitHub under the MIT license.
You can also peruse online many impressive dashboards done with Tableau.
Acknowledgments
This is the first project I have ever published on GitHub and it would probably never have left my computer without the strong support of CFM’s Open-Source Program and the help of people working there.
I would especially like to thanks Jean-Sébastien Dieu, Eric Lebigot and Florent Zara for their support on various aspects of Jupytab and for helping me to publish this article.