The world’s leading publication for data science, AI, and ML professionals.

Interactive Mass Spectra with Bokeh

Creating dynamic and informative visualizations using Python

Non-pythonic bokeh. Photo by Alex Iby on Unsplash
Non-pythonic bokeh. Photo by Alex Iby on Unsplash

Mass spectrometry (MS) data is conceptually simple. In a nutshell, it is a table with obligatory mass-to-charge values (m/z) and corresponding non-compulsory signal intensity, charge state, etc. But a mass spectrometrist harbors the deep-rooted yearning to really look at her data, which in the context of the field means to visually inspect the distribution of m/z and intensity values. A mass spectrum is traditionally represented as a sequence of vertical lines for discrete spectra or as a smooth continuous line for unprocessed data, with m/z on x-axis and intensity on y-axis. The range of values on both axes is typically very wide, so a zoom feature will come in handy when inspecting the spectrum. The exact m/z values are often important to the viewer, which makes it convenient to have numerical labels.

Researches often use proprietary vendor software to view mass spectra, but wouldn’t it be great to have simple and open-source options? Among what has been published in Python, the spectrum_utils [1] package provides the capability to display interactive mass spectra based on the Altair visualization library. In this post, I will share the recipe for plotting interactive MS visualizations using Bokeh, which, in my opinion, provides an amazing balance between simplicity and flexibility for creating interactive plots and dashboards.

Bokeh visualizations can have all the necessary data and interactive features embedded in an HTML document (standalone plots), or they can be connected to a running Python instance, giving access to a virtually limitless custom data processing. Standalone Bokeh plots can be saved and viewed in a web browser, or embedded in a Jupyter notebook. If you have fresh (as of May 2021) versions of Bokeh (2.3.1) and JupyterLab (3.0.14), installing jupyter_bokeh extension via pip or conda should suffice. With the extension in place, just call output_notebook(), and plots should appear in the notebook once you invoke the show() command. In addition, if you want to save the plot as an HTML file, add the command output_file(‘file name.html’).

Let’s start by loading libraries and opening the table with m/z and intensity values that correspond to a single peptide spectrum. The code and the data examples can be found in the GitHub repo.

(477, 2)

Since mass spectra are traditionally displayed as a bunch of vertical lines, why don’t we create a plot using vbar plotting command? First, we will construct a ColumnDataSource, which is a powerful structure in Bokeh that enables interactivity and connectivity between visualizations. Second, we should specify the tool tips that will display intensity and m/z values with 4 digits after the decimal each time the cursor hovers over a signal. Then we create the figure of the desired dimensions with the customized set of tools, which includes pan and wheel zoom that are restricted to x-axis, and finally add the vertical bars. Below I have pasted the animated GIF images that showcase the interactive functionality of the resulting plots:

Image by author
Image by author

Looks encouraging, but there are some noticeable deficiencies. Bars have a fixed width, and we want quite thin bars/lines to separate some of the very close but distinguishable m/z values. This leads to very thin lines that are hard to see on modern extra high-resolution screens. I think that a spectrum would benefit from having more substantial lines with constant thickness regardless of the zoom level. Besides, the hover tool doesn’t work properly because the bars are extremely thin, even though we have specified the tool correctly.

A line plot is another option, but we will need to modify the data sligthly in order to get vertical bars instead of the shortest lines between the experimental data points. Let’s add the points with zero intensity to every original signal, this will make each of them look like a vertical line:

We can now represent the spectrum as one continuous line. Let’s also introduce the axis labels and add a special line that shows the m/z and charge of the precursor ion (see my earlier blog post about precursors and fragments), which is an additional piece of information that a mass spectrometrist will appreciate:

Image by author
Image by author

This is much better! Note how the hover tool behaves as desired at all zoom levels, showing the annotations whenever we pass the cursor over the line.

If the signals have annotations, it would be cool to highlight the annotated categories with color. We can achieve that by creating a separate line for each category. Furthermore, interactive behavior can be specified with the figure.legend.click_policy attribute, so that the signals will mute or hide as a result of a click on the corresponding legend item. Let’s load the annotated table, transform it to "verticalize" the signals and then create the composite line plot:

array(['Unknown', 'Identified', 'Contaminant'], dtype=object)
Image by author
Image by author

Conclusions

We have created an interactive annotated mass spectrum using Bokeh. The visualization can be saved and shared as an HTML document or embedded in a Jupyter notebook. The code and the data examples are available in the GitHub repo.

References

[1] Wout Bittremieux. spectrum_utils: A Python package for mass spectrometry data processing and visualization. Analytical Chemistry (2020), 92 (1) 659–661.


Related Articles