Serious Software Development Using Jupyter Notebooks

Use nbdev to build Python packages using Jupyter notebooks

Dipam Vasani
Towards Data Science

--

Photo by Sean Lim on Unsplash

Introduction

The image below comes from the documentation page of a library called fastcore.

It has the source code hyperlinked so you can easily check the implementation, it has a small pargraph explaning what the function does and, it has tests so you can verify the behavior of the function.

Compare this to your regular Github repository with partial or no documentation, no hyperlinks to jump around and tests that exist in a separate folder and you can already see which approach is better.

Yes you can have workarounds but nothing will compare to what I am about to show you.

Literate programming

Back in 1983, Donald Knuth envisioned a style of programming called literate programming. In this style of programming, you move away from writing computer programs in the manner and order imposed by the computer, and instead enable programmers to develop programs in the order demanded by the logic and flow of their thoughts.

source: https://en.wikipedia.org/wiki/Literate_programming

The best environment to capture this style of programming is Jupyter notebooks. You can work on bits of code, spell out your thoughts and when you are satisfied, put it all together. They’re great for iterative, experimental style of programming.

However, until recently, Jupyter notebooks lacked a lot of additional functionality to build software. It lacked components around it, that when put together, make Jupyter a true literate programming environment.

Introducing, nbdev.

nbdev

nbdev is a library developed by fastai to enable development of Python packages using nothing but notebooks. I’m not going to give you a summary of the features because I want you to read along and find out everything in detail. I’m also not doing a code walkthru since fastai already has a really good tutorial on their website so it wouldn’t make sense to repeat the same steps.

The goal of this article will be to introduce you to this world, one that you didn’t know existed, get you to love it and get you to use it in your next project.

How does it work?

Let’s start by looking at how nbdev works. Since I’m not doing a code-walthru, we will be browsing the source code of a library already built using nbdev known as fastcore.

If you go to the Github page of fastcore, you will see that it has a folder called nbs which, as the name suggests, includes a bunch of notebooks.

Each notebook represents a module of the package we are building. For example 00_test.ipynb will be converted to a module called test.py.

Users of our package can then use it as follows:

from fastcore.test import *

The conversion from notebook to a .py file is done with the help of a function called notebook2script() which is executed at the end of each notebook.

If you open one of the notebooks, you’ll notice that it looks like any regular Jupyter notebook except for a few things.

The first thing you’ll notice is a bunch of #export (s) at the start of certain cells.

#export is how nbdev determines which cells will become part of the module. From the above code, cell [1] will go into test.py but cell [2] won’t.

Cell [2] however, will become part of the documentation for this module.

This is my favorite part. Whatever code and explanation you write in your notebook is automatically converted to docs. You don’t need to spend any extra time on it. Also, the fact that we’re able to write tests just below the source code and that they become part of the documentation as well is a hugeeeee bonus.

What happens with traditional Python testing modules is that, the tests exist in a different folder. This makes it difficult to identify which tests are related to which functionality. And when you make changes to the source code, it’s difficult to make changes to the relevant tests since they’re not right in front of your eyes.

Nbdev solves both these problems for you. Plus anyone reading your code also sees the how the tests are designed and gets a richer learning experience.

Finally, notice that the keywords that were in backticks in our markdown, were searched in the code base and if found, hyperlinked as well (see test_eq_type in the above image). This is just incredible. Automating mundane tasks just puts so much more creative power in your hands when developing software.

Features of nbdev

Let’s now look at the feature list of nbdev and I can give you a brief explanation for each feature

  • Automatically generate docs

This one I’ve already explained.

  • Utilities to automate publishing to PyPI and conda packages

Whenever you do pip install bla , bla has to exist on Python Package Index (PyPI). Similarly for Conda. Publishing to PyPI requires you to package your project in a certain way and create a bunch of files. Nbdev makes this process really simple by automating tasks and providing utilities to help you publish without any hassle.

  • Two-way sync between notebooks and source code

IDEs are useful for a lot of tasks, for example, they provide better tools for debugging. With nbdev, you can edit the source code (.py file) as well and the changes will reflect in your notebook. This allows you to use the best features of both mediums.

  • Fine-grained control on hiding/showing cells

Complete Jupyter notebooks are converted to documentation, but you can hide certain cells if you want. Just like #export , nbdev includes a #hide keyword. Put it on top of any cell and it won’t occur in the generated documentation.

  • Write tests directly in notebooks

I mentioned you can do this, but did I mention that you can run these tests via CLI just like you do with pytest. You can also group them if you don’t want to run the long running ones everytime.

  • Merge/conflict resolution

One of the problems with Jupyter notebooks is that version control does not work well with it. Git often gives merge conflicts and Github’s visual diff shows changes to the underlying json of a Jupyter notebook instead of the actual cell changes. This is annoying. Well, let nbdev handle it for you. It cleans up notebooks to avoid these conflicts and if they do occur, gives you a human readable format of the error.

Github Actions allows you to automate workflows like what happens when you push a new change or a new issue is created or a PR is raised. nbdev sets this up for you without any manual intervention required.

  • All the features of Markdown

Now that we’re using Markdown for documentation, we can easily include images, formatted text and even math equations very easily.

  • …and much more

At this point I would suggest you to go through the tutorial and learn more about the parts that excite you.

Why the title?

There is a very famous talk by Joel Grus called “I don’t like notebooks”. Joel is an excellent presenter and his talk is really funny, but it discourages people from using notebooks for serious development.

Now, the talk is really old, some of the stuff that exists now, didn’t exist back then. But still, there is a general consensus among professionals that Jupyter notebooks are just for experimentation and if you want to write serious software that will be deployed, you have to use IDEs. That’s no longer the case.

With this article, I hope to change that. I hope you give this other side a try, I hope that more people do serious software development using Jupyter notebooks.

~happy learning.

--

--