The world’s leading publication for data science, AI, and ML professionals.

Boosting your data science workflow with vim+tmux

As most of my peers, I started my carreer in data science working with the Jupyter eco-system. Jupyter is a great environement, easy to…

Photo by SpaceX on Unsplash
Photo by SpaceX on Unsplash

Like most of my peers, I started my career in Data Science working with the Jupyter ecosystem. Jupyter is a great environment, easy to set up, which offers useful built-in features.

At some points, I nevertheless felt that I needed to move beyond. Some limitations, inherent to notebooks, started to kill my productivity. Just to name a few:

  • Version control with notebooks is problematic. I was constantly fearing to send notebooks outputs containing client data online. In addition, it is pretty inconvenient to have git keeping track of the outputs modifications while the code is actually unchanged.
  • Even though progress has been made recently, editing capabilities, like advanced search-and-replace, are still very limited in the Jupyter environment.
  • When the project gets larger, it is common to have several notebooks open simultaneously. Navigating from notebook to notebook is a pain, especially if they lurk in an ocean of internet tabs. The browser is for googling, not for development for code’s sake!

I was therefore actively looking for alternatives and luckily, I met a bunch of cool dudes that taught me the old-school way of code development. It relies on the vim + tmux combo, which combines a powerful terminal-embedded editor with a multiplexer. Together they provide advanced editing capabilities but also interactivity, as required for data exploration. In addition, this workflow can be fully operated with the keyboard, resulting in a substantial amount of time saved since you no longer need to constantly switch between the keyboard and the mouse.

You may wonder why I didn’t consider using an IDE like Pycharm. Well, there are two main reasons for that. Firstly, IDEs are not really portable and, as a consultant, I tend to work on many different environments. Secondly, and more importantly, it looks so much cooler to work on a dark screen where you can execute code and move from pane to pane at (almost) the speed of thoughts.

This post first aims at guiding you through the setup of a basic, but functional, data science environment based on vim + tmux. I will also showcase how such a setup can boost your productivity in your projects.

Disclaimer: You will need basic familiarities with vim to follow this post. If you are a complete novice, maybe first take a look at this article and then come back.

Tmux

Tmux is a command-line tool that enables multiple windows and panes within a single terminal window. Technically, it is called a multiplexer. Installation is as simple as sudo apt-get install tmux. You can then create your first session with:

tmux new -s mycoolproject

Within a session, you can control windows and panes using the prefix command ctrl+b and some specific keys. For instance, ctrl+b " produces a horizontal split of the windows. You can then navigate between panes using ctrl+b arrows.

A big advantage of tmux is that it allows us to run multiple sessions in parallel. This is very convenient for quickly switching between different projects without risking to mixed-up your scripts. You can detach from your current session using ctrl+b d, list the existing sessions with tmux ls and attach to a different session with tmux a -t <session name>.

Quick overview of tmux capabilities.
Quick overview of tmux capabilities.

It is fairly easy to customize tmux, you simply need to edit the config file .tmux.conf located in your home directory. For instance, many people like to rebind the prefix command to ctrl+a.

I only aimed at providing a brief overview of tmux here, but if you want to learn more, there are plenty of great tutorials out there. It is also worth to take a look at this cheat sheet.

❤️ vim ❤️

I like Vim. Really. Vim is one of those things like black coffee, Sunday morning jogging, or Godard movies that can feel a bit harsh at first, but that becomes more and more enjoyable through time and practice.

Some say that the learning curve is steep. It’s true. But it is also extremely rewarding when you start to master new shortcuts or macros that considerably improve your productivity.

Vim is a highly customizable text editor directly embedded in the terminal. Vim is present by default on all Unix-like systems. No installation needed. The basic configuration has limited capabilities but you can quickly add features (like syntax highlighting, auto-completion, etc..) by tuning or adding plugins to the .vimrc, the configuration file located in your home directory that is loaded when starting the editor.

I’ve made a simple .vimrc available in this repo. It will help you to replicate the steps described below. However, I strongly recommend to set-up your own config file in order to better feel the spirit of vim.

We will use three plugins in the course of this tutorial:

  • vimux, which enables vim to interact with tmux
  • vim-pyShell, a wrapper around vimux specifically designed to ease the use of ipython
  • vim-cellmode, a matlab-like code block execution for ipython

The easiest way to install plugins is through a plugin manager. I personally use vim-plug, but there are plenty of other good options.

Vim-plug is easy to install. It only requires a single bash command:

curl -fLo ~/.vim/autoload/plug.vim  https://raw.github.com/junegunn/vim-plug/master/plug.vim

You then just need to specify the desired plugins in your .vimrc between call plug#begin() and call plug#end() as illustrated in the snapshot below.

First lines of .vimrc
First lines of .vimrc

To install the plugins, execute the command :PlugInstall with your .vimrc open. Then restart vim to source the config file.

Code execution

Once our plugins are up and running, we can start to send instructions from vim to the Terminal.

Within a tmux session, open a python script with vim. In normal mode, you can fire an ipython terminal by calling the dedicated function from the newly installed plugins with the command :call StartPyShell(). By default, this will create a pane at the bottom of the screen and starts an ipython session.

Code can be executed either by:

  • sending instructions line by line. To do this, move your cursor to the desired line and run the command :call RunPyShellSendLine().
  • sending code blocks delimited with ##{/##}. In this case, go to the block and call RunTmuxPythonCell(0).
Sending commands directly from vim to the shell with vimux
Sending commands directly from vim to the shell with vimux

This is already pretty cool, but it actually requires quite some typing. Can we do better?

Boosting your productivity with the relevant mappings

Automating repetitive tasks. This is the secret for shortening development time and hence boost your productivity. And the good news is that Vim is really good at that.

The main idea consists in creating mappings for the most common tasks. Let’s take a closer look at how to actually implement mappings. Again, this is done in the .vimrc. In the snippet below, lines 2 and 3 map the shortcuts ,ss and ,sk to ipython start and stop commands, respectively, while the second block defines the mappings for code execution.

It is well known that most of the time in data science is devoted to data preparation. This step heavily relies on dataframe manipulations. Hence, defining mappings associated with basic operations like:

  • printing the dataframe first elements: ,sdh
  • printing the dataframe info: ,sdi
  • plotting the content of the dataframe: ,spp
  • displaying the histograms: ,sph
  • showing the content of a variable: ,so
  • getting the length of an iterable: ,sl

will save you a lot of time. In addition, you are not polluting your script with numerous prints and outputs since the inspection is performed through passing the variable/object under the cursor to a backend function. No additional typing needed.

Let’s see those mappings in action!

Few mappings were sufficient to really boost my productivity!
Few mappings were sufficient to really boost my productivity!

Concluding thoughts

Combining the advanced editing capabilities of vim with few well-designed mappings has really enhanced my productivity. This workflow helps me to meet the tight deadlines inherent to my job. It is true that it requires a substantial initial investment, but I am convinced that the pay-back is much higher, in terms of time saved but also in terms of working comfort.

What keeps amazing me with vim is the endless customization possibilities. So be creative, start to hack the .vimrc and implement your own mappings!

Thanks for reading! If you enjoyed this type of content, you may also be interested in this post. And feel free to follow me on Medium to not miss my next article on how to use vim in data science. You can also support my writing by joining Medium using my affiliated link.


Related Articles