The world’s leading publication for data science, AI, and ML professionals.

Extending KNIME Python Integration with Plotly Express and Kaleido

An alternative way to generate powerful interactive plots and high quality static images with minimal coding

Image by Author: Screenshot - KNIME Analytics Platform workflow example - Desktop view
Image by Author: Screenshot – KNIME Analytics Platform workflow example – Desktop view

A key strength of the KNIME Analytics Platform is its flexibility to allow you to install a variety of extensions to enhance its vanilla "out of the box" capabilities. Naturally the decision whether or not to install a particular KNIME extension depends on the needs of your individual project and its environment. Sometimes it may simply be driven by your own personal preference. One such area relates to data visualisation. In this article I describe how I configured KNIME to employ Plotly Express and Kaleido as my preferred image generation engine for python views, opening up a wealth of additional powerful graphing libraries for use within my KNIME workflows.


Table of Contents

Installation

  • Base Environment
  • KNIME Integration with Python
  • Install Plotly and Plotly Express
  • Install Kaleido
  • Final Conda Environment Configuration

Example KNIME Workflow

  • Overview
  • Read Source Data
  • Transform Data
  • Generate Plotly Express and Kaleido Images
  • Example Images
  • Comparison with KNIME Stacked Area Chart View
  • KNIME Conda Environment Propagation

Takeaways

References


Installation

Base Environment

In brief my base installation environment is as follows:

  • Local machine: Google Pixelbook running Chrome OS.
  • Linux VM hosted on Google Cloud Platform (Compute Engine). Operating system: Ubuntu. Version: 16.04 LTS.
  • Google Chrome Remote Desktop configured to access an Xfce desktop on the Linux VM. Installation instructions here [1].
  • Knime Analytics Platform for Linux (Version 4.3.1). Installation instructions here [2].

In addition I also make use of a Google Cloud Storage Bucket in which I store source data to feed into the KNIME workflow.

KNIME Integration with Python

Next I installed the KNIME Python Integration to be used with KNIME Analytics Platform. For KNIME version 4.3 the installation instructions are here [3]. Essentially there are three steps:

  • Install the KNIME Python Integration Extension
  • Install Anaconda Python (Individual Edition).
  • Create a Conda environment for KNIME Python Integration.

Note that the recommendation is to create the Conda environment automatically from within KNIME Analytics Platform (instead of manually with a YAML configuration file).

I set the default version as Python 3 and created a new environment called "py36_knime_mjh_update". The Python version being used is 3.6.12.

Image by Author: Screenshot - KNIME Preferences - Python
Image by Author: Screenshot – KNIME Preferences – Python

Install Plotly and Plotly Express

Having set up the core KNIME Python Integration environment, next I installed the Plotly and Plotly Express modules. For this I used Anaconda Navigator to perform a conda install into the KNIME conda environment created above.

Image by Author: Screenshot - Plotly and Plotly Express installation using Anaconda Navigator
Image by Author: Screenshot – Plotly and Plotly Express installation using Anaconda Navigator

This installed the following packages and dependencies:

Image by Author: Screenshot - Selected Plotly and Plotly Express packages
Image by Author: Screenshot – Selected Plotly and Plotly Express packages

Install Kaleido

The final step was to install Kaleido. This took several attempts to get it to work.

For the first attempt I tried doing a conda install of Kaleido using Anaconda Navigator as above. However when I tried to execute the KNIME workflow to generate static images (see later), an exception was thrown. For some reason it could not find the Kaleido executable folder in:

"../anaconda3/envs/py36_knime_mjh_update/lib/python3.6/site-packages/kaleido/"

After some further playing around trying to work out what was wrong, I decided instead to hit the thing with a hammer and do a pip install of Kaleido from a terminal window.

First I activated the KNIME conda environment:

$ conda activate py36_knime_mjh_update

Then I installed Kaleido:

$ pip install kaleido

A quick check now revealed that the Kaleido executable folder was now where it was expected to be. And this cured the problem.

Image by Author: Screenshot - Kaleido Executable folder location
Image by Author: Screenshot – Kaleido Executable folder location

Final Conda Environment Configuration

The final configuration of the KNIME conda environment is provided as a YAML file in this GitHub gist [4].


Example KNIME Workflow

Overview

My example workflow is shown below.

Image by Author: Screenshot - KNIME Workflow Example
Image by Author: Screenshot – KNIME Workflow Example

Essentially here’s what it does:

  • Connects to a Google Cloud Storage Bucket
  • Reads the source data from an Excel file
  • Applies some transformation on the data to prepare it for viewing (and dumps the resulting table to files).
  • Generates a Stacked Area Chart in several formats (for demo purposes)

I’ll now describe the various elements in a bit more detail.

Read Source Data

Image by Author: Screenshot - KNIME Workflow - Read Source Data
Image by Author: Screenshot – KNIME Workflow – Read Source Data

The example data relates to deaths involving COVID19 in Scotland for the year 2020 and the first few weeks of 2021. It is publicly accessible from the National Records of Scotland here [5]. The data is provided in "long form" in an Excel spreadsheet.

Image by Author: Screenshot - Source Data (long form)
Image by Author: Screenshot – Source Data (long form)

Transform Data

Image by Author: Screenshot - KNIME Workflow - Transform Data
Image by Author: Screenshot – KNIME Workflow – Transform Data

Here the source data is simply filtered and pivoted in order to generate a data table in "wide format" as input for the stacked area chart. I also dumped the output table into a KNIME table file and an Excel file (for later reference should I need it).

Image by Author: Screenshot - Transformed Data (wide form)
Image by Author: Screenshot – Transformed Data (wide form)

Generate Plotly Express and Kaleido Images

Now we come to generating the stacked area plot for our data. This is achieved using the KNIME Python View node.

Image by Author: Screenshot - KNIME Workflow - Python View
Image by Author: Screenshot – KNIME Workflow – Python View

To configure this node, right-click → Configure:

Image by Author: Screenshot - Python View configuration
Image by Author: Screenshot – Python View configuration

In the left panel we see the input variables. This is our "input table" and in the Python View node it is treated as a dataframe (vis-a-vis pandas). Accordingly there is no need to load the pandas module explicitly.

The central panel contains the script being used for this example. Here it is in full:

Here’s what it does first:

  • Imports the required modules plotly.express and plotly.offlne.plot
  • Applies a short name to the input_table dataframe
  • Creates a list from the input_table column names.
  • Grabs the first column name from the list to assign as our X axis
  • Grabs the remaining column names to assign as our categories for the Y axis.
  • Defines the plot. In Plotly Express the stacked area chart is generated using the px.area function.

Then for demo purposes it generated the stacked area chart in a variety of formats:

  • Automatically launches an interactive plot in the default browser
  • Saves the interactive plot as an HTML file (together with the required JavaScript)
  • Using Kaleido saves stacked area plot as a static image in png, jpeg, svg and pdf formats.

Finally it converts the figure to a static image bytes string using the plotly.io module. It assigns this to the KNIME "output_image" variable for the Python View node.

Example Images

The interactive HTML plot generated by Plotly Express is automatically launched in the default browser, shown here with the "compare data on hover" toggle enabled:

Image by Author: Screenshot - Plotly Express - Stacked Area Chart - Interactive HTML plot
Image by Author: Screenshot – Plotly Express – Stacked Area Chart – Interactive HTML plot

This is the static image (png format) generated and saved using the Kaleido engine:

Image by Author: Static PNG Image generated by Kaleido
Image by Author: Static PNG Image generated by Kaleido

And by right-clicking on the KNIME Python View node, select Image:

Image by Author: Screenshot - KNIME Python View node - Right click menu
Image by Author: Screenshot – KNIME Python View node – Right click menu
Image by Author: Screenshot - KNIME Python View node - Output Image
Image by Author: Screenshot – KNIME Python View node – Output Image

Comparison with KNIME Stacked Area Chart View

It should be noted that the "out of the box" base version of KNIME Analytics Platform already provides various "canned" JavaScript View nodes for generating different types of plots. These include the Stacked Area Chart node.

Image by Author: Screenshot
Image by Author: Screenshot

For a quick comparison I simply hooked up this node in a similar workflow.

Image by Author: Screenshot - KNIME Workflow with Stacked Area Chart (JavaScript) node
Image by Author: Screenshot – KNIME Workflow with Stacked Area Chart (JavaScript) node

Here’s the static image that was generated:

Image by Author: Screenshot - KNIME Stacked Area Chart node - Static Image
Image by Author: Screenshot – KNIME Stacked Area Chart node – Static Image

An interactive JavaScript chart may also be viewed:

Image by Author: Screenshot - KNIME Stacked Area Chart node - Interactive Chart
Image by Author: Screenshot – KNIME Stacked Area Chart node – Interactive Chart

The advantage of using the canned KNIME views is that in this case no code is required at all. Simply plug it into the workflow, configure it and then press the button. And that may be fine depending on your own needs.

It largely comes down to personal preferences. My own choice is for the greater flexibility and range of plots offered by Plotly Express (and its "parent" Plotly). And in this example the python code was kept to a bare minimum (just a few lines of elegant code needed). I can certainly live with that!

KNIME Conda Environment Propagation

Finally you will notice that I also included the KNIME Conda Environment Propagation node into this example workflow (although for our purposes it was not really necessary here).

Image by Author: Screenshot - KNIME Workflow - Conda Environment Propagation
Image by Author: Screenshot – KNIME Workflow – Conda Environment Propagation

This is a feature which allows you to preserve the specific KNIME conda environment with the workflow. This ensures that the required python modules and versions are specified, making the workflow more portable . For example in the case where you wish to share the workflow with other colleagues or upload the workflow to KNIME Server. For more information read this [6].


Takeaways

  • KNIME Analytics Platform provides the flexibility to extend its "out of the box" capabilities based on particular needs and personal preferences.
  • The KNIME Python Integration opens up the power of python and allows you to integrate it into your workflows.
  • The Plotly Express and Kaleido python modules bring a wealth of graphics libraries into the mix and provide for publication quality interactive plots and static images.
  • In particular Plotly Express allows you to keep coding down to a minimum. Just a few lines of elegant code do the trick!
  • When setting up the KNIME conda environment it appears you need to perform a pip install of Kaleido (as opposed to conda install) to get it to behave within KNIME.

In closing I would be keen to hear from anyone regarding the last point above about the Kaleido install.

For those of you wishing to get a copy of the KNIME workflow (including the source data used) there is a modified version available on the KNIME Hub. Simply search for "Python View using Plotly Express and Kaleido".


References

[1] Setting up Chrome Remote Desktop for Linux on Compute Engine

[2] KNIME Analytics Platform Installation Guide

[3] KNIME Python Integration Guide

[4] KNIME Python Conda Environment YAML file on GitHub Gist

[5] National Records of Scotland – Deaths involving coronavirus (COVID-19)

[6] KNIME Conda Environment Propagation


Related Articles