The point of no return: Using nbdev for the past 6 months changed the way I code in Jupyter notebooks

With nbdev, Jupyter notebooks are now a literate programming environment. With nbdev, my productivity increased. My code is now more organized. The way I approach a new project changed for good.

Miguel Pinto
Towards Data Science

--

Photo by Kaitlyn Baker on Unsplash.

What is nbdev? — If you are asking that, get ready because what you are about to find will make your brain explode. Not literally, don’t worry. You will be fine. If you already use nbdev probably you will relate to some aspects of my story. Let me know, I would love to hear about your experience.

"nbdev is a library that allows you to fully develop a library in Jupyter Notebooks, putting all your code, tests and documentation in one place. That is: you now have a true literate programming environment, as envisioned by Donald Knuth back in 1983!” — quoting nbdev repository.

As a researcher and data science enthusiast I work with data on a daily basis. We humans can’t live without air, water and food. For data scientists, data should be part of that list.

I use Jupyter notebooks on my daily life, it’s the environment where I code, solve problems and experiment new ideas. However, Jupyter notebooks alone have an important limitation. When it’s time to move the code to production and create a python package, it’s necessary to copy all the code to Python modules. Countless times I spent many hours to restructure and clean the code. A process characterized by a lot of copy paste and debugging.

I could live with that working routine, it was not ideal but I didn’t know best. This changed on my last birthday. In that same day, Jeremy Howard announced the new nbdev library. The day was certainly a coincidence but it felt like a birthday gift. After reading about what it promised I immediately knew this was something big.

In the next three sections, I will cover the top nbdev features, how to get started with nbdev and how it changed the way I code over the past 6 months. I will then conclude with a few final remarks. Let’s dive in.

Top nbdev features

In my view, the most exciting features of nbdev are the following:

  • A Python package is automatically generated from the notebooks;
  • The documentation for the package is also built from the notebooks, including the code examples with graphics and images that you may want to include in your notebook;
  • The auto-generated documentation can be viewed in GitHub pages with a nice looking format by default.
  • When you push your code to GitHub a series of checks will automatically be made to ensure that the format of the notebook is clean and the tests you include in your packages will also run;
  • Your package can be uploaded to PyPi with a single command.
  • All this process of development is scalable to large projects. If you want an example go no far than the fastai2 library that is generated from almost 70 Jupyter notebooks.

How to get started with nbdev

Getting started with nbdev is very straight forward. After I read the description on the GitHub page I started using it right away. Any questions are usually answered in the documentation of the package or checking fastai2 notebooks as an example. The basic steps are:

  • Create a GitHub repository from the template they provide;
  • Clone the repository and run nbdev_install_git_hookscommand to configure for your project;
  • Edit and complete the settings.inifile with basic information such as the name of the package, your GitHub username, and so on;
  • Start developing your code in Jupyter notebooks;

The first cell of the notebook should include #default_exp module_nameto tell how the Python module auto-generated from that notebook should be called. Each notebook will be a different python module. To tell nbdev how to handle each notebook cell there are a few comment tags that you write on the top of the cell. Mostly you will use:

  • #export — for cells containing the code that should be exported to the Python module;
  • #hide — to hide cells that you don’t want to show on the documentation.

There are several commands you can run on the command line. They start by nbdev_ , you can use tab completion to find all the existing commands but most of the time you will need nbdev_build_lib to build the library and nbdev_build_docs to build the documentation.

It is worth mentioning that nbdev also makes very easy to create console scripts for your package!

Once everything is ready the package can be installed locally with pip install -e . but you may want to upload it to PyPi. In that case, you simply run make pypi and nbdev will take care of everything provided that you have followed the configuration steps.

How nbdev changed the way I code over the past 6 months

Using nbdev on most of my projects over the last 6 months made a significant difference in the way I code. Not only because of nbdev functionality itself but also because it made me think more about how I should structure the code. The code needs to have some organization in order to get the most out of nbdev.

Looking back in time

  • My code was very unorganized, consisting usually of countless Jupyter notebooks with different versions of the same process and non-suggestive filenames;
  • Most of the time I would write the code for myself;
  • If I had to look back at my own code months later I would struggle to understand it;
  • I never dared to create a PyPi package, it all looked too much effort for a simple project.

Looking now at the present time

  • Since I started using nbdev my code gradually became more structured and simpler to share and reuse;
  • Now I create my notebooks thinking that they are to be understood by humans;
  • When I have to go back to a project I didn’t work on for a while, I can easily read throughout the notebooks to refresh my memory;
  • With nbdev, creating a package is just a natural consequence of the structure and organization of the code in the notebooks. Since then I created a few packages. I covered a recent example in another story named Split overlapping bounding boxes in Python.

Final remarks

Writing code for humans is a concept that needs to be embraced in today’s world. Jupyter notebooks can be a great way to share ideas and knowledge. Ideas and knowledge are the building blocks for a brighter future. If we can share the building blocks more efficiently we will be accelerating progress. It takes time to polish the ideas into a simple and elegant form. But that’s so important. Because otherwise all people trying to build upon such ideas will have to individually polish them, for their own understanding. In some way, nbdev makes it easier and less time consuming to share the ideas in a simple and elegant form. nbdev takes care of what can be automated giving more time to the developer to focus on the presentation of the notebook.

I hope this story was useful and inspiring to you!

--

--

PhD student (Remote sensing, Meteorology), ML/DL enthusiast, fastai student, competition master at Kaggle, pianist/composer