JupyterLab for complex Python and Scala Spark projects
JupyterLab is an awesome piece of technology for prototyping and self-documenting research. But can you use it for projects that have a big codebase?
The case for an external library
The notebook workflow was a big improvement for all data scientists around the globe. The ability to directly see the result of each step and not running over and over the same program was a huge productivity boost. Moreover, the self-documenting capacity makes it so easy to share with coworkers.
That said, there is a limit to what you can achieve in a notebook. It is best for interactive computing, but it’s no longer interactive when each cell is more than 100’s lines of code. At this point, what you need is a real IDE like VS Code or PyCharm and maybe some unit tests.
Being developed outside of your current project, a good library should be generic enough to help you, and your coworker, on a wide range of projects. See it as an investment that will pay back many times in the future.
Now, how can you push this library back into Jupyter?
The Python kernel
Suppose you want to add new functionalities to Spark objects, for instance a doSomething() method on Spark and a predict() method on a…