The world’s leading publication for data science, AI, and ML professionals.

A juypter notebook extension for graphical publication figure layout

Communication is key to science and in many fields communication means presenting data in a visual format. In some fields, such as…

Communication is key to science and in many fields, communication means presenting data in a visual format. In some fields, such as neuroscience, it’s not uncommon to spend years editing the figures to go into a paper. This is in part due to the complexity of the data, but also in part due to the difficulty of quickly making plots to the standard of publication in the field using tools such as matplotlib. Subplots of different sizes, intricate inset plots and complex color schemes often drive scientists toward using graphical-based tools such as photoshop or inkscape.

This post describes the development of a pair of tools which may extend the figure complexity easily achievable with python and matplotlib. The main idea is to graphically define subplots within a figure. This is done leveraging the fact that jupyter notebooks run in a browser, and an extension to the notebook can inject HTML/javascript drawing widgets into the notebook. This lets the user define the subplot layout using a mouse rather than the more cumbersome matplotlib numerical way of defining axes. Then, once the rough plot is done, various components can be resized algorithmically to fit within the allotted canvas space.

Part 1: the drawing widget

Setting up the extension skeleton

As mentioned, the widget is built on top of the jupyter-contrib-nbextensions package, which provides a nice infrastructure for creating compartmentalized extensions which can independently be enabled/disabled. Making your own extension is a bit of cobbling together functions from existing extensions. This link is a good starting point.

The nbextensions package keeps each extension in its own folder in a known directory. Once you have installed the nbextensions package, this code snippet will help you find the directory

from jupyter_core.paths import jupyter_data_dir
import os
nbext_path = os.path.join(jupyter_data_dir(), 'nbextensions')

nbext_path is where the code for your extension should ultimately end up. However, this location is not the most convenient location to develop the code, and more importantly, we’ll need some way of "installing" code here automatically anyway if we want to distribute our extension to others without having to have it included in the main nbextensions repository. (There are all sorts of reasons to do this, including "beta testing" new extensions and that as of this writing the last commit to the master branch of the nbextensions repository was nearly 1 year ago).

A better approach than developing directly in nbext_path is to make a symbolic link to a more accessible coding location. Including this python script in your code directory will serve as an install script. Executing python install.py will make an appropriately named symlink from the current directory to nbext_path.

Now distribute away your extensions!

Creating the extension

User flow Let’s briefly discuss the user flow of the extension before getting into implementation

Begin with an empty notebook cell and press the icon on the far right which looks like two desktop windows.

You can use your mouse to create an initial subplot:

When you’re satisfied with your layout, press the "Generate python cell" button to create a cell with equivalent python/matplotlib code.

The main challenges are injecting the HTML canvas when the toolbar button is pressed, and then automatically creating the python cell when the layout is ready. Once those are done, the rest of the implementation is just like every other javascript project.

Implementation The main.js file is where most of the coding will happen. Below is the outline of the empty extension

define([
  'base/js/namespace',
  'base/js/events'
], function(Jupyter, events) {
// add a button to the main toolbar
  var addButton = function() {
    Jupyter.toolbar.add_buttons_group([
      Jupyter.keyboard_manager.actions.register({
        'help': 'Add figure layout generator',
        'icon': 'fa-window-restore',
        'handler': inject_figure_widget
      }, 'add-default-cell', 'Default cell')
    ])
  }
  // This function is run when the notebook is first loaded
  function load_ipython_extension() {
    addButton();
  }
  return {
    load_ipython_extension: load_ipython_extension
  };
});

This skeleton code runs a ‘startup’ function when the notebook is loaded. That ‘startup’ function creates the toolbar button and also registers a callback to the toolbar putton press. That callback, inject_figure_widget, is the ‘main’ function of the extension which will inject the HTML canvas into the notebook. To make main.js self-contained, you can define helper functions inside of the main function(Jupter, events).

Figuring out the JS/HTML to inject a canvas into the output field is a bit of trial and error using the console and the element inspector. The rough outline is:

// execute the current cell to generate the output field; otherwise it won't be created
Jupyter.notebook.select();
Jupyter.notebook.execute_cell();
// get reference to the output area of the cell
var output_subarea = $("#notebook-container")
  .children('.selected')
  .children('.output_wrapper')
  .children('.output');
// add to DOM
let div = document.createElement("div");
output_subarea[0].appendChild(div);

Now the HTML elements of the widget can be added to div just like in any javascript-powered web page. Some special handling is needed for keyboard input elements, however. You’ll find if you try to type numbers into input fields that it converts your cell to markdown and eliminates the output field. This is because of Jupyter notebook’s default keybindings. The fix is to disable Jupyter’s keyboard manager when one of your text fields becomes in focus, and re-enable when it exits focus:

function input_field_focus() {
 Jupyter.keyboard_manager.disable();
}
function input_field_blur() {
 Jupyter.keyboard_manager.enable();
}
$("#subplot_letter_input").focus(input_field_focus).blur(input_field_blur);

Other functionality The implemented widget has a number of other functions for which I won’t describe the implementation as it is all fairly standard javascript:

  • Splitting plots into gridded subplots
  • Resizing subplots with the mouse
  • Aligning horizontal/vertical edges of selected plot to other plots
  • Moving subplots by mouse
  • Moving subplots by keyboard arrows
  • Copy/paste, undo, delete
  • Creating labels
  • Code generation
  • Saving and reloading from within the notebook

See the README of the widget for illustration of functionality.

Part 2: programmatic resizing

The mouse-based layout tool is (hopefully) an easier way to define a complicated subplot layout. One difficulty in laying out a figure with multiple subplots in matplotlib is that sometimes text can overlap between subplots. Matplotlib is beginning to handle this issue with the tight layout feature, but that feature does not appear to be compatible with the generic way of defining subplot locations used here; it is meant to be used with the grid-based subplot layout definitions.

What we’d like as a user is to

  1. Create a rough layout graphically
  2. Fill in all the data and the labels
  3. Call a routine to automatically make everything "fit" in the available space.

Step 2 must happen before everything can be "made to fit". This is because it’s hard to account for the size of text-base elements beforehand. You might add or omit text labels, which occupies or frees space. Depending on your data range, the tick labels might a different number of characters occupying different amounts of canvas area.

A very simple algorithm to make all the plot elements fit on the canvas is

  1. Calculate a bounding box around all subplot elements.
  2. For each pair of plots, determine if the plots overlap based on the bounding boxes.
  3. If there’s overlap, calculate a scale factor to reduce the width and height of the leftmost/topmost plot. Assume that the top left corner of each subplot is anchored. When this scale factor is applied, there should be no overlap for this pair of plots. (Sidenote: if two plots are overlapping assuming zero area allocated for text, they will not be resized; the assumption then is that the overlap is intentional such as for inset plots).
  4. Apply the smallest pairwise scale factor globally.

This is by no means the best Data Visualization algorithm, but it should always produce an overlap-free plot. This algorithm is implemented in this simple python module]

Axis bounding box

Finding the bounding box of various elements in maplotlib takes some trial-and-error. The data structures representing plot elements are quite flexible which can make it hard to figure out how to get the size of elements on the canvas if you’re not familiar with the API (I am firmly in the "not familiar" camp). Below is a simple search which iterates through all the children of an axis and tries to get the size of different recognized elements. I could not figure out a more uniform approach than the one below.

def get_axis_bounds(fig, ax, scaled=False):
    children = ax.get_children()
# initial est based on ax itself
    p0, p1 = ax.bbox.get_points()
    xmax, ymax = p1
    xmin, ymin = p0
for child in children:
        if isinstance(child, Matplotlib.axis.XAxis):
            text_obj = filter(lambda x: isinstance(x, matplotlib.text.Text), child.get_children())
            text_obj_y = [x.get_window_extent(renderer=fig.canvas.renderer).p0[1] for x in text_obj]
            ymin_label = np.min(text_obj_y)
            if ymin_label < ymin:
                ymin = ymin_label
        elif isinstance(child, matplotlib.axis.YAxis):
            text_obj = filter(lambda x: isinstance(x, matplotlib.text.Text), child.get_children())
            text_obj_x = [x.get_window_extent(renderer=fig.canvas.renderer).p0[0] for x in text_obj]
            xmin_label = np.min(text_obj_x)
            if xmin_label < xmin:
                xmin = xmin_label
        elif hasattr(child, 'get_window_extent'):
            bb = child.get_window_extent(renderer=fig.canvas.renderer)
            if xmax < bb.p1[0]:
                xmax = bb.p1[0]
            if xmin > bb.p0[0]:
                xmin = bb.p0[0]
            if ymin > bb.p0[1]:
                ymin = bb.p0[1]
            if ymax < bb.p1[1]:
                ymax = bb.p1[1]
if scaled:
        rect_bounds = np.array([xmin, ymin, xmax, ymax])
        fig_size_x, fig_size_y = fig.get_size_inches() * fig.dpi
        rect_bounds /= np.array([fig_size_x, fig_size_y, fig_size_x, fig_size_y])
        return rect_bounds
    else:
        return np.array([xmin, ymin, xmax, ymax])

There’s a small catch: this method requires matplotlib to first render the figure canvas. Before this rendering, matplotlib may not properly inform you how much space an element will take up. So you’ll have to use matplotlib in interactive mode. Presumably you’re in a jupyter environment if you’re using the widget from part 1. If you use the %matplotlib notebook style of figure generation which is interactive, this issue shouldn’t be a problem.

Getting the boundaries of the plot area is quite a bit simpler because that’s how you specify where to draw the axes. The information is stored on the bbox attribute of the axis.

fig_size_x, fig_size_y = fig.get_size_inches() * fig.dpi
plot_bounds = ax.bbox.get_points() / np.array([fig_size_x, fig_size_y])

Once the axis boundary and the plot boundary is known, the size of the border containing the text elements can be calculated on each side. The size of the border is fixed (unless the text changes), so the algorithm to calculate the rescaling factor on the plot is simply to scale it down by the fraction occupied by the border text

Resizing examples

Below are a few examples of auto-scaling plots to accomodate errant space occupied by text.

Axis extending too far horizontally Before:

After:

Axis extending too far vertically Before:

After:

Axes overlapping horizontally Before:

After:

Axes overlapping vertically Before:

After:

Conclusion

Altogether, this approach may automate some of the more tedious data visualization tasks researchers may face when publishing. Dealing with the layout issues algorithmically may lend itself to developing more sophisticated algorithms for laying out figures to be more naturally readable.


Related Articles