The world’s leading publication for data science, AI, and ML professionals.

Split Your Jupyter Notebooks (In 2 Lines of Code)

Is Your Notebook Too Large?

Written by: Amal Hasni & Dhia Hmila

AI-Generated with Stable Diffusion
AI-Generated with Stable Diffusion

IPython Notebooks are really useful if you want to test a new idea in a quick and dirty way or to explore Data for the first time. Either way, Notebooks have a tendency to quickly get large and bulky and usually require cleaning and refactoring when you’re done exploring at least if you want to share them with your manager, coworker, or future self.

One operation that is surprisingly common is to split the notebook into multiple sub-notebooks, usually based on headings/titles. If you want to do this in Jupyter, you’d have to duplicate the notebook multiple times and delete in each notebook the relevant Cells.

What if there was a faster way to do this systematically? Well this is what you’ll learn in this article, using nbmanips , a python package / CLI that I created to easily manage Notebooks.


Table of Contents :
· Installing nbmanips
· 1 - Splitting notebooks
· 2 - Splitting a Notebook based on the Table of Contents
· 3 - Splitting notebooks using a Python script
· Bonus: Concatenate multiple notebooks

Installing nbmanips

Installing nbmanips is very simple if you use pip:

pip install nbmanips

Make sure the CLI works and that you have at least version 1.3.0 installed, by running the following command:

You can test the following library with your own notebook files if you want but in case you need notebooks, here’s a great Git repository with over 30 Machine Learning related notebooks.

1 – Splitting Notebooks

Once you’ve installed nbmanips, you can use the CLI, to easily split the notebook. It’s up to you to tell the package at which level you wish to perform the split. Say you need to make a split every time there is a markdown cell with a title (h1 HTML tag), all you need to do is specify that in the command as follows:

nb select has_html_tag h1 | nb split -s nb.ipynb
  • The first part of the command ( nb select has_html_tag h1 ) will tell nbmanips on which cells to perform the split.
  • The second part ( nb split -s nb.ipynb ) will split the notebook based on the piped selection. The -s flag tells nbmanips to use the selection instead of a cell index.

In the example, the selection is performed on Markdown cells that have a level 1 heading, but you can customize that to your liking. For example, you can also split on level 2 headings:

nb select has_html_tag h1,h2 | nb split -s nb.ipynb

If you want to learn about other selectors or other use cases, feel free to check this other article:

Rapidly Explore Jupyter Notebooks (Right in Your Terminal)

By default, the result notebooks will be named nb-%d.ipynb , but you can customize that by adding the --output/-o option:

nb select has_html_tag h1 | nb split -s nb.ipynb -o 'custom-name-%d.ipynb'

2 – Splitting a Notebook based on the Table of Contents

A simpler way to split a notebook is using the index of the cell itself, using this command, this can be helpful if you want to split on a specific title or code cell, for example:

nb split nb.ipynb 5,9

The downside is that finding the cell index can be tedious in a large Notebook. Thankfully, there are easier ways to find the index.

For example, you can display the table of contents with the following command:

nb toc nb.ipynb

Another example, that is less obvious, if you want to figure out the index of a cell that contains an import statement and that is amongst the last 10 cells of the notebook:

3 – Splitting Notebooks using a Python script

nbmanips is a python package, which means you can use it inside a python script, which can be useful if you want to do more complex stuff or automate treatments for a bunch of files.

Before you start any treatment, you have to read the notebook:

Now that you have the notebook, you can split using a selection as we have seen in the first example:

Or like we’ve seen in the previous example, using the Table of contents:

Bonus: Concatenate multiple notebooks

You can concatenate multiple notebooks using the following command:

nb cat nb1.ipynb nb2.ipynb -o result.ipynb

Or if you’re using a python script:

Final Thoughts

nbmanips tries to be a Swiss Army Knife but for Jupyter Notebooks, so you can easily, split, merge and explore Notebooks without having to think about it.

I think it’s a nice tool to have in your pocket, it won’t necessarily be useful every day, but when you need it, you’ll be thankful to have it.

Another use case, you might have is to concatenate multiple notebooks. You can, check out our other article that not only shows how to do that but also goes into detail about the structure of a Jupyter Notebook file, in case you are interested:

How To Easily Merge Multiple Jupyter Notebooks Into One

If you have questions, don’t hesitate to leave them in the response section and we’ll be more than happy to answer.

Thank you for sticking around this far, stay safe and we will see you in our next article! 😊

More Articles To Read

Rapidly Explore Jupyter Notebooks (Right in Your Terminal)

This Decorator will Make Python 30 Times Faster


Related Articles