Jupytext — Diff your Jupyter notebook as you want
The Jupyter notebook has been the go-to tools for a lot of data scientist, and it has been used widely for education due to the ability to run cell one by one. The fact that the notebook is a huge JSON file makes it hard to work with tools like git, nbdime is a great alternative if you want to see the output of Jupyter Notebook, I have written a post about nbdime about six months ago.
nbdime is great, but it does not work well with IDEs like VS Codes and PyCharm. What if there is a tool that allows you to work with notebook but you can diff like scripts? There is a simple solution — Jupytext. The magic about Jupytext is, it pairs your notebook with a .py script automatically(you can also pair with .html/.md, etc…). It is a simple but effective solution that allows you to enjoy the benefit of both sides. There are some cool things unlocked by this notebook-scripts pair.
- Open a .py script like a notebook
- Version control is easy
- Full IDE refactor capability with your Jupyter Notebook
1. Open a .py script like a notebook
You can install Jupytext with this command easily, it will install Jupytext and the notebook extension come with it.
pip install jupytext
If you have pay attention to the file name, it is actually a .py file, but you can treat it like a notebook. You can play with the file interactively in every way, except that it is still a .py file, so the output will not be cached in the script file, it only exists in your browser.
Pair your notebook with the script
With the notebook extension, it is just 1 click to pair your notebook. Here I have paired my notebook with light format script. If you do not care what format it is, just go with the light script. According to the Jupytext doc, these are the supported format currently.
Markdown and R Markdown documents,Julia, Python, R, Bash, Scheme, Clojure, Matlab, Octave, C++ and q/kdb+ scripts.
2. Version Control is easy
The fact that it pairs your notebook with scripts means, well, you can just version control the .py file. The notebook is not necessarily added into the repository anymore, only if you want to show the output.
3. Full refactor capability with your notebook
Jupytext try very hard to figure out the changes in your paired notebook and script. If you make changes on your notebook, the script will be updated simultaneously.
So here is the description of what I am doing.
- Do refactoring in VS Code
- Save the script file.
- Press F5 to refresh the notebook to get the new changes
Conclusion
This is my most common use case with Jupytext, but it can do a lot more. Do check out their Github page for more advanced usage. Marc himself has also written a blog about the release of Jupytext 1.0 where he explains the functionality in details. Let’s do more refactoring with notebooks.