Polynote: the new Jupyter?

There is a lot of buzz surrounding Polynote, Netflix’s multi-language notebook. But is it the Jupyter-killer, or is it all hype?

Tom Titcombe
Towards Data Science

--

Source: https://polynote.org/

If you’ve done any kind of analytical work with Python in the last few years, there is a good chance you have come across Jupyter notebooks. The ability to execute new code without needing to re-run the entire script have made notebooks a ubiquitous part of the analytical workflow, but Jupyter notebooks are not without issues. For example, code editing, hints and linting is not supported; this can lead to messy, difficult to debug code and cause some developers to avoid the tool entirely. In 2019, tech behemoth Netflix open-sourced their notebook tool, Polynote, which many people are touting as the Jupyter-killer.

Like Jupyter, Polynote has multi-language support, however Scala has been included as a “first-class language” in Polynote, with additional support for Python (+ Spark), SQL and Vega. While Jupyter allows you to create a notebook in one of its supported languages (JUlia, PYThon, R), Polynote facilitates multi-language support in a single notebook and even allows for interoperability between code executed in different languages.

Installation

Unlike Jupyter, Polynote is built with JVM-based languages at its core, which makes the installation process slightly more involved than pip install polynote. Problematically, Polynote is currently only available on Linux and OSX; there is some discussion about the inclusion of Windows, however it is not an aim of the development team. Given how open-source Windows ports of popular tools like Swift have failed to bloom, it would be foolish to expect a seamless experience of Polynote on Windows anytime soon.

That being said, let us assume you have a Linux or OSX machine. I will outline the steps I took to get Polynote working on Ubuntu 18.04.

  1. Go to the releases and download the latest polynote-dist.tar.gz under “Assets” of the latest release
  2. Extract polynote with tar -zxvpf polynote-dist.tar.gz
  3. Install Java: sudo apt install default-jre and sudo apt install default-jdk
  4. Set JAVA_HOME with export JAVA_HOME=/usr/lib/jvm/default-java/
  5. pip3 install jep jedi virtualenv

Additional installation is required for Spark support. You can read more about installing Polynote here.

Go into the polynote directory and execute polynote.py , then navigate in a browser to the local address provided (I’ve tested on Firefox and Chrome without problems).

Polyglot notebooks

The eponymous feature of Polynote is language interoperability: if you define a variable in one language, you can use it in another. This will clearly have limitations as not all languages share the same concepts of types and objects, however most important types, including numpy arrays and pandas dataframes, are supported.

I must confess to having only a tangential knowledge of Scala so I am uncertain of the specific use-cases which would require it to work in tandem with Python, but it is evident that removing the need to convert data between the two languages would make such a project run far more smoothly.

The majority of projects I have worked on have only required one language (mostly Python, some R) so it is easy to believe that, outside of large projects, language mixing would be a superfluous tool for most data scientists. Of course, as an open-source project, it is likely that the community will extend the suite of supported languages (there is already interest in R, Julia, and Java), which would make Polynote a more appealing tool to a great number of developers; personally, I would love to be able to run statistical tests on Python data in R, rather than spending time trying to find a valid, maintained Python port of the test.

Visualisation

Data visualisation is a crucial part of data science, yet is it a skill often neglected. Polynote emphasises high quality visualisation by generating Vega specifications from Scala data.

Vega is a visualization grammar, a declarative language for creating, saving, and sharing interactive visualization designs. With Vega, you can describe the visual appearance and interactive behavior of a visualization in a JSON format, and generate web-based views using Canvas or SVG. — https://vega.github.io/vega/

Vega can easily generate aesthetic, complex visualisations. The popular Python visualisation library Altair is built upon Vega-lite, the statistical visualisation part of Vega (you can read about generating interactive maps in Altair here).

Unfortunately, there is no direct support for visualisations of Python data, so there is little benefit for the Python natives. You can still enter the schema directly of course, but in most cases it would be far easier to use a tool like Altair.

An interactive bar chart? Try producing that in matplotlib

Code Editing

A large disadvantage of Jupyter notebooks is how opaque the code is: there are no visual cues for incorrect code, as you would find in any decent IDE, nor code completion for functions (which causes my API usage to increase markedly). Polynote, however, bridges the gap between script and notebook.

With Polynote, there is no need to memorise swathes of functions

Type errors in Scala will be underlined, however Polynote does not care about Python type hints (much like the Python community).

Polynote does not compete with the best code editors, but it is a vast improvement on Jupyter notebooks in this regard.

Summary

I was intrigued by the possibilities of Polynote. There was a lot of buzz in the data science community and features like in-built Vega plot generation and inter-language communication sounded like game changers. However, I don’t believe Polynote offers enough to draw most data scientists away from Jupyter. Firstly, the lack of Windows support puts it out of reach of a great number of people; Windows is the most popular operating system and its usage is mandated by many businesses and organisations. Secondly, while Scala has its benefits, it is not a vastly used language for everyday data science. Most analytics projects can be and are adequately carried out solely in Python, so language mixing and in-built Vega are of no benefit.

Polynote’s code editing is a great improvement upon Jupyter and I hope Jupyter improves in this regard. However, as a more established and more actively developed project, I find Jupyter the stronger offering of the two. Polynote is certainly one to watch. It just needs a little more time.

--

--