JupyterLab — A Next Gen Python Data Science IDE

An article to nudge Python Data Scientists towards JupyterLab.

Rene Draschwandtner
Towards Data Science

--

While working on data science projects with Python, you probably asked yourself which IDE serves best your needs on data exploration, cleaning, preparing, modeling, evaluation and deployment. You’d probably also have done some research on Google, scanned through various pages titled like ‘Top Python Data Science IDE’ and started desperately realizing that none of the mentioned products combines a seamless look and feel for all necessary implementation steps. Ultimately you turned back to your well known, but yet separated set of tools. The good news is, there is a very promising project waiting just to be released as 1.0 version tackling a lot of our day to day data science needs. I am referring to JupyterLab.

A processed image of Jupiter from Juno’s ninth close flyby provided at [13]. NASA / SWRI / MSSS / GERALD EICHSTÄDT / SEÁN DORAN © CC NC SA

Why JupyterLab?

Python notebooks got a lot of attention in the recent years as a tool showing code and results in an interactive and nicely layouted manner. It certainly helps to lower the barrier to start with programming and helps in education, because an input is presented together with its processed output instantly in a browser, which many users are very familiar with. Despite Python notebooks’ popularity, a classic Python IDE or text editor becomes more convenient the more coding needs to be done. Wouldn’t it be nice if there would be a tool taking the best of each and consequently combines both worlds? JupyterLab is working towards that goal by enabling users to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner [1].

Jupyter notebooks’ “evolution of the Jupyter web interface” to JupyterLab is based on a user experience survey conducted in 2015 highlighting following three success factors [2]:

  1. Users like the notebook experience.
  2. Users want to combine and remix different Jupyter building blocks.
  3. Users need the ability to easily collaborate.

These factors are not surprising looking into some Pros and Cons based on my experience with Jupyter notebooks.

Pros:

+ Jupyter notebooks are particularly strong when it comes to visualization functionalities. E.g. tools like Google Facets have been developed for being used in Jupyter notebooks [3].

+ The interaction with plots is very convenient e.g. by just simply using %matplotlib notebook or ipywidgets [4].

+ One can add a pretty and concise documentation of a piece of code by changing a cell from Code to Markdown.

+ Jupyter notebooks are a pretty neat tool for data storytelling and presentation as its possible to show a documentation along with a code’s output.

Cons:

- The absence of a built-in variable inspector is one of the first things experienced standard IDE users are missing in Jupyter notebooks. I want to highlight though that there is a extremely helpful community-contributed unofficial extension using the notebook’s metadata [8].

- Jupyter notebooks do not provide a convenient file explorer view when developing code [5]. Thus, reading and writing files becomes clumsy.

- One needs to prefix terminal commands with a exclamation mark ! within a cell in order to interact with the operating system’s terminal or use the terminal view added as addon [5].

- Opening and exploring files is clunky as one needs to load the file first and choose an appropriate way to display it programmatically. This requires more effort than opening e.g. a jpg file with a double click within an IDE.

- Testing and modularity are difficult to handle within Jupyter notebooks.

- Seemingless integration with a version control system is missing, although there is interesting progress with add ons like nbdime making diffing and merging of notebooks easier [7].

- Absence of a convenient visual debugging plus profiling functionality, despite really promising developments like the PixieDebugger [10].

I want to highlight that this is not an exhaustive list of Pros and Cons. A statement listed under in the Cons section does indicate that the mentioned functionality is not achievable at all. It is also listed under Cons in case its not intuitively available in Jupyter notebook.

Let’s look into the details with the currently available version of JupyterLab (0.35.6) and see what is gonna be covered when moving from Jupyter notebook to JupyterLab.

Python and Jupyter notebook files sharing a single kernel

JupyterLab lets you develop complex python code as well as Jupyter notebooks and making it easy to connect them to the same kernel. I see this as a key feature for tackling the Cons.

In the following animation you see how to connect multiple Python files and notebooks in JupyterLab.

Creation of two Python files and one Jupyter notebook in JupyterLab. Consecutively, you see the selection of one common kernel for each of the files. At the end you can observe that all three files have access to the same kernel as they are using the the variables a and b interactively.

Now look at the bellow animation as it shows the simplicity of loading data into a dataframe, developing models separately while testing and visualizing them with the power of Jupyter notebooks in a seamless manner. All of this is possible in addition to having one common variable inspector and file explorer. You can see here a simple manual function approximation task.

Exploration of the csv file and loading it into a dataframe in a kernel which is shared among the open files. The dataframe is visible in the variable inspector. First the given x and y vectors are plotted in blue. Afterwards, the function approximator plotted in orange is iteratively improved by manually adjusting the function fun in the file model.py. The approximator covers fully the given data input at the end. Therefore, only an orange line is visible anymore.

Effectively this decouples extraction, modeling and visualization without having to write and read files to share the data frames. This is a massive time saver for your daily work, as it reduces the risk of mistakes in the file loads, and because it's much faster to setup your EDA along with trials in the early stages of projects. Furthermore, it helps to reduce the number of code lines in case you add as many asserts into your data pipeline as me.

In case you need a terminal really quick within the same context of your project, then you can just simply open the launchpad and create a new Terminal view. This is particular useful if want to check the resources needed by your model or algorithm, as shown in the following animation.

JupyterLab- Ian Rose (UC Berkeley), Chris Colbert (Project Jupyter) at 14:30 shows how to open a terminal within JupyterLab [9].

Opening a data file is also pretty neat with JupyterLab. It is rendered in a nicely e.g. in tabular form for csv files and utilizes lazy loading, hence making it fast plus it supports enormous file sizes. The next animation shows opening the IRIS data set from a csv file.

JupyterLab- Ian Rose (UC Berkeley), Chris Colbert (Project Jupyter) at 19:15 shows the IRIS data set in a csv file being opened with a simple click [9].

You can also open image files with just a click, which comes pretty handy when working on computer visions tasks. In the following animation you see how Jupyterlab renders an image of the hubble telescope in a separate of the last used panel.

JupyterLab- Ian Rose (UC Berkeley), Chris Colbert (Project Jupyter) at 17:58, shows an image being rendered in by clicking on it in the built in file explorer [9].

Furthermore, you can navigate and utilize Git with JupyterLab’s Git extension as shown below.

Parul Pandey’s gif showing the navigation in the Git extension provided in [6].

There is no visual debugging and profiling functionality available in JupyterLab at the time writing this article. It is currently planned for a future release [11]. Hence, development will start earliest after version 1.0 has been released. Despite this plans, there is work being done to enable PixieDebugger for notebooks in Jupyterlab [12].

Conclusion

JupyterLab adds a complete IDE around Jupyter notebooks making it surely a strong evolution of the Jupyter notebooks. It integrates so well into the data scientists’ daily work that it can be also seen as the Next Gen tool. The ease of decoupling data extraction, transformation, modeling visualization and testing is already really powerful.

With this in mind I hope seeing the 1.0 release popping soon. In case you got excited about the JupyterLab project and want to try it yourself, just follow the instructions in Parul Pandey ‘s article:

[1] Project Jupyter, JupyterLab Overview (2018), https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html

[2] N. Tache, JupyterLab: The evolution of the Jupyter web interface (2017), https://www.oreilly.com/ideas/jupyterlab-the-evolution-of-the-jupyter-web-interface

[3] J. Wexler, Facets: An Open Source Visualization Tool for Machine Learning Training Data (2017), https://ai.googleblog.com/2017/07/facets-open-source-visualization-tool.html

[4] 5agado, Interactive Visualizations In Jupyter Notebook (2017), Interactive Visualizations In Jupyter Notebook

[5] I. Rose and G. Nestor, JupyterLab: The Evolution of the Jupyter Notebook (2018), https://www.youtube.com/watch?v=NSiPeoDpwuI&feature=youtu.be&t=254

[6] P. Pandey, Jupyter Lab: Evolution of the Jupyter Notebook (2019), https://towardsdatascience.com/jupyter-lab-evolution-of-the-jupyter-notebook-5297cacde6b

[7] Project Jupyter, Jupyter Notebook Diff and Merge tools (2019), https://github.com/jupyter/nbdime

[8] Jupyter Contrib Team, Variable Inspector (2019), https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/varInspector/README.html

[9] I. Rose and C. Colbert, JupyterLab Next-generation user-interface for Project Jupyter (2018), https://www.youtube.com/watch?v=t5q19rz_FNw&feature=youtu.be

[10] D. Taieb, The Visual Python Debugger for Jupyter Notebooks You’ve Always Wanted (2018), https://medium.com/codait/the-visual-python-debugger-for-jupyter-notebooks-youve-always-wanted-761713babc62

[11] Project Jupyter, JupyterLab (2019), https://github.com/jupyter/roadmap/blob/master/jupyterlab.md

[12] Project Jupyter, JupyterLab (2017), https://github.com/jupyterlab/jupyterlab/issues/3049

[13] M. Bartels, NASA Releases Treasure Trove Of Incredible New Images Of Jupiter From Its Juno Mission (2017), https://www.newsweek.com/nasa-releases-treasure-trove-incredible-new-images-jupiter-its-juno-mission-705210

This article aims to provide reasoning as to why JupyterLab might be the IDE of choice for a Python data scientist, by combining the author’s practical experience with a profound literature research. It should not act as installation guide, nor as listing and comparing of features.

--

--