SOFTWARE-DEVELOPMENT

The Good, the Bad and the DataSpell

An honest review of JetBrain’s Data Science IDE after a year of using it

Dor Meir
Towards Data Science
6 min readSep 17, 2022

--

“a code editor casting a spell”, at least according to DALL·E 2

Disclaimer (originated in here): This is not a sponsored article. I don’t have any affiliation with DataSpell or its creators. The article shows an unbiased overview of the IDE, intending to make data science tools accessible to the broader masses.

DataSpell was officially released in last November, yet Towards Data Science diligent authors have posted some reviews of the preview edition as early as in August and September. One might say all major DataSpell capabilities were already revealed, so why another piece on this matter?

First, things have changed in the IDE since last year. But most importantly, I’ve been using DataSpell for Data Science research and for developing Machine Learning applications for a whole year now. Hence, my mileage with the tool is probably high enough to reduce its traits to the most useful and the most painful ones.

Let us begin with three DataSpell pains. I advise you, though, to not despair from the beginning and keep on reading through, as the tools’ three perks are invaluable.

DataSpell issues

1. High usage of resources

DataSpell is probably not even a close competitor in this aspect to other IDE’s such as Visual Studio. While JetBrains advices 8 GB RAM is enough for running the program, if you work with datasets of 1M rows and more you shouldn’t settle for anything less than 16 GB. CPU-wise, it’ll also be unwise to have an older generation than i7 (or other vendors’ equivalents), unless you have a lot of time to burn. The IDE uses considerable amount of resources specifically on start-up (while initializing the interpreter and scanning files and packages), or if you’ve attached a folder that’s auto-syncing to cloud. Nevertheless, it gets to a pretty descent speed when finishing the start up, or at least to a similar speed to its predecessor, PyCharm.

Not the most agile IDE you can find… Image by Author

2. Bugs

The official release still has some bugs, nonetheless, upgrading to newer versions solved more than a few bugs for me. For instance, the embedded Jupyter server had a major bug where it wasn’t possible to load a saved notebook, but one later release fixed that issue for me. Another solved bug occurred while connecting to the license server using a proxy, which simply didn’t work in prior releases. One yet unresolved bug is that Jupyter debugger doesn’t always work, and when it does it’s quite slow. I cope with that one by simply copying the code to a regular script file and performing the debug over there. I do hope it’s only a matter of time until they fix that one, since Jupyter debugging can be useful. In any case, the bugs forum seem to be a very active one, so when encountering a bug you can always ask for a fix and maybe even find a solution.

DataSpell has great features, but quite a few bugs as well. Photo by Justin Lauria on Unsplash

3. It’s (sort of) not free

It costs 9$ a month for personal use, but it’s free for students or teachers as part of JetBrains educational package.

DataSpell perks

1. Comfortable Data view

Clicking Open in a new tab opens a wide and long view of the DataFrame. Not only does it allow you to sort or hide columns with a mere mouse click, running a debug when the data panel’s open in a split tab lets you observe how your DataFrame changes with every line of code (!). This turns out to be remarkably useful when troubleshooting for reasons the code behaves unexpectedly for only some specific cells.

You can forget about sort_values() and pd.options.display.max_rows(). Image by Author
The columns display options. Explaining your work has never been easier. Image by Author
Debug the data manipulation, cell by cell. Pretty cool. Image by Author

2. Enhanced Jupyter

DataSpell improves Jupyter significantly: with a faster and more exhaustive code completion, with an embedded files explorer (no need of Jupyter Lab anymore), with an already embedded table of contents (no need of installing add-ons or figuring out how to make the menu float), and with rows numbering already pre-set for all cells. To sum it up, once you get use to it, the overall Jupyter experience is simply better.

DataSpell has basically embedded PyCharm code completion into Jupyter notebook. Image by Author
Exploring files, viewing the table of contents, and having line numbering — without any setup. Image by Author
There’s no way I’m going back to this… Image by Author

3. One-stop shop

After using DataSpell for a year now, I almost forgot how long and painful it is to do the presumably elementary thing of opening a Jupyter notebook: you have to open anaconda, run jupyter notebook, wait for the web browser to open and make sure the folder path is correct. In DataSpell, however, you directly open a notebook file inside the IDE, and that’s about it. Running the first block of code will start Jupyter server in the background, and a few seconds later you can manipulate the notebook as if it was just another Python script.

It all came down to this… Amazing! Images by Author

Switching between notebooks and scripts will now be outstandingly easy. Moreover, Research and Development in DataSpell is much more intertwined: instead of writing a load of functions in a notebook and copying them in the end to a script, DataSpell makes it immensely easier to write functions directly to production script and import them back to the notebook, for further research. Thus, Dataspell is effectively allowing us to enjoy the true benefits of two tools: both the interactive and graphical outputs of Jupyter, and the fast and efficient code writing of PyCharm.

You just need to run one block of code to open Jupyter. Image by Author

So, it’s fairly straightforward to run Jupyter from DataSpell. More importantly, the IDE functions as a true one-stop shop, as: performing research (in a notebook), exploring the data (in the data panel), developing and debugging production ready code (in a script) — are all done in the same place!

The IDE can be viewed as a true one-stop shop, as: performing research (in a notebook), exploring the data (in the data panel), developing and debugging production-ready code (in a script) — are all done in the same place!

This DataSpell trait becomes crucial when working in a client’s environment which is highly secured and sensitive. You no longer must explain the need for separate installations of PyCharm, Jupyter Notebook or Lab, or Microsoft Excel. You can get most of these tools’ benefits by installing just one exe file. Furthermore, if you’re concerned about updating the tool regularly and managing other JetBrains tools, try installing the JetBrains Toolbox that enables updating and downgrading DataSpell (and other tools) in a manner of a mouse click.

JetBrains Toolbox can easily open, update, roll-back and uninstall DataSpell. Image by Author

Summary

DataSpell is JetBrain’s flagship IDE for Data Scientists. It’s not the most lightweight IDE, it still suffers from bugs and most chances you’d have to pay for it. Still, the benefits probably outweigh the pains: the interactive way of viewing data, the enhanced Jupyter experience and having Research and Development all in one place and intertwined — make it worthwhile.

Feel free to share your feedback and contact me on LinkedIn.

Thank you for reading, and good luck! 🍀

--

--

A Data Scientist & a data enthusiast, with Economics and DevOps prior experience, and education in Economics and Philosophy https://www.linkedin.com/in/dor-meir