The world’s leading publication for data science, AI, and ML professionals.

The quickest way to deploy your Machine Learning model!!

Python, Machine Learning, Tableau, TabPy

Using Python, Tableau, Machine Learning and TabPy

Photo by Subash Patel
Photo by Subash Patel

Data science is all about presenting insights to the end-users in the most simplistic way possible. You work on a machine learning/deep learning model from data cleaning to hyperparameter tuning. However, you realize that the most important task of presenting it to the end-users has not even started yet. Here I discuss an easy and faster way to deploy ML models using Jupyter Notebook and Tableau.

We will use Scikit-Learn to process the data and build the model. Then we use TabPy to deploy the built model and access it in Tableau. If you are looking for a way to deploy models to use it in cloud platforms or distributed systems, you can discontinue reading now.

The Data

We will use the Titanic dataset available on Kaggle to build a Random Forest model. The goal of the project is to predict if a passenger will likely survive the Titanic disaster or not. We will use demographic variables like Age, Gender, sibling count, and also the ticket class of the passenger as independent variables.

The goal of the project is to predict if a passenger will likely survive the Titanic disaster or not.

Dashboard showing the final deployed model.
Dashboard showing the final deployed model.

The dataset contains 891 rows with 177 missing values in Age. We replace the missing values with random numbers within one standard deviation around it’s mean. Similarly, we modify the NaNs in Fare to 0.

Modeling

We build a Random Forest classifier with the default parameters. However, we achieved an accuracy of ~94.5% with just 6 variables suggesting that it could have overfitted on the training data. But, our focus is to deploy the trained model quickly. So, we will not try to get the best evaluation metrics like Precision or Recall. There are plenty of other resources on Kaggle which focus on that and here is a simple model without cross-validation or parameter tuning.

Requirements

We need Python 2.x or Python 3.x already installed on our machine to start the Deployment or you can use Anaconda to install Jupyter Notebook along with Python. Also, we need to install TabPy to launch a server that can host the deployed model. You can do that simply by following the steps here. If you feel that the installation command is taking time, then close the cmd prompt and try it in another one. I had to do it a couple of times to get it right.

TabPy (the Tableau Python Server) is an Analytics Extension implementation which expands Tableau’s capabilities by allowing users to execute Python scripts and saved functions via Tableau’s table calculations.

Steps to get the server up and running

  1. Once you have TabPy installed in your conda environment, open a new anaconda command prompt and change your directory to TabPy server directory using "cd C:UsersvipanAnaconda3pkgstabpy-server-0.2-py36_1Libsite-packagestabpy_server". In place of vipan, use your PC username and corresponding directory path.
  2. Run the command "startup.bat" and you should see the below screen. DO NOT CLOSE THIS WINDOW as we need the server online.
Enabling the server connection
Enabling the server connection

Deployment

We need to create a function that can take the selected parameter values in Tableau as input and return the probability of that person surviving the Titanic. Now, we deploy the function along with the trained model from Jupyter Notebook onto the Tableau-Python server. The model is saved as a pickle file on the server. After this step, the trained model is hosted on the local TabPy server that can be accessed from Tableau!!

If you get any errors saying "tabpy_client" not found, then you can run "pip list" command in anaconda cmd prompt to see the list of installed libraries and install the ones required. By having, "override = True" you can train and deploy the same model while replacing the older version with the latest or updated one. This is helpful when you need to train the model on new data every day. The _arg1, _arg2, etc will be passed from Tableau.

Tableau Configuration

Once the server is running, in Tableau go to:

Test the connection and you should see the below notification. Click OK.

Import the "train.csv" file into Tableau. In order to access the RF model, we have to pass parameters from Tableau. So, let’s create parameters with required data types to access the model like below.

Similarly, create the parameters for Class, Gender, Fare, # of siblings/spouses (SibSp), and # of children or parents (Parch) with data labels. (Tableau workbook for reference).

Create a calculated field with the following script to connect to the Tabpy server.

Script for the calculated field in Tableau.
Script for the calculated field in Tableau.

We have to multiply the survival probability with 100 in order to get the percentage value. Also, I created death probability by creating another new calculated field (Remember, you cannot access the previous calculated field in another one as this is a field with analytics extension). and voila!! now, we can just create visualizations using the parameters we created. _Here is the interactive dashboard I created_.

Unfortunately, tableau-public doesn’t allow dashboards with analytics extension to be published but you can publish them onto your respective work servers.

The data and code for the Jupyter Notebook along with Tableau workbook can be found on Github.

  1. https://github.com/shlok6368/App-Rating-Predictor-Model-and-Tabpy-Integration/blob/master/Model%20Building.ipynb
  2. https://medium.com/@shlok6368/end-to-end-data-science-pipeline-using-python-and-tabpy-data-scraping-data-cleaning-model-41b2dcb63667
  3. https://towardsdatascience.com/predicting-the-survival-of-titanic-passengers-30870ccc7e8
  4. https://www.theinformationlab.co.uk/2019/04/09/how-to-set-up-tabpy-in-tableau/

Related Articles