Jupytext — Diff your Jupyter notebook as you want

Nok Chan
Towards Data Science
3 min readMar 28, 2019

--

The Jupyter notebook has been the go-to tools for a lot of data scientist, and it has been used widely for education due to the ability to run cell one by one. The fact that the notebook is a huge JSON file makes it hard to work with tools like git, nbdime is a great alternative if you want to see the output of Jupyter Notebook, I have written a post about nbdime about six months ago.

nbdime is great, but it does not work well with IDEs like VS Codes and PyCharm. What if there is a tool that allows you to work with notebook but you can diff like scripts? There is a simple solution — Jupytext. The magic about Jupytext is, it pairs your notebook with a .py script automatically(you can also pair with .html/.md, etc…). It is a simple but effective solution that allows you to enjoy the benefit of both sides. There are some cool things unlocked by this notebook-scripts pair.

  1. Open a .py script like a notebook
  2. Version control is easy
  3. Full IDE refactor capability with your Jupyter Notebook

1. Open a .py script like a notebook

You can install Jupytext with this command easily, it will install Jupytext and the notebook extension come with it.

pip install jupytext
You can open a py script like a .ipynb file with Jupyter notebook

If you have pay attention to the file name, it is actually a .py file, but you can treat it like a notebook. You can play with the file interactively in every way, except that it is still a .py file, so the output will not be cached in the script file, it only exists in your browser.

Pair your notebook with the script

With the notebook extension, it is just 1 click to pair your notebook. Here I have paired my notebook with light format script. If you do not care what format it is, just go with the light script. According to the Jupytext doc, these are the supported format currently.

Markdown and R Markdown documents,Julia, Python, R, Bash, Scheme, Clojure, Matlab, Octave, C++ and q/kdb+ scripts.

File → Jupytext →Pair Notebook with script

2. Version Control is easy

The fact that it pairs your notebook with scripts means, well, you can just version control the .py file. The notebook is not necessarily added into the repository anymore, only if you want to show the output.

Simply diff the .py file instead of the .ipynb file !

3. Full refactor capability with your notebook

Jupytext try very hard to figure out the changes in your paired notebook and script. If you make changes on your notebook, the script will be updated simultaneously.

Sync Notebook with the script!

So here is the description of what I am doing.

  1. Do refactoring in VS Code
  2. Save the script file.
  3. Press F5 to refresh the notebook to get the new changes
Refactor in IDE -> Refresh in Notebook and continue

Conclusion

This is my most common use case with Jupytext, but it can do a lot more. Do check out their Github page for more advanced usage. Marc himself has also written a blog about the release of Jupytext 1.0 where he explains the functionality in details. Let’s do more refactoring with notebooks.

--

--