JupyterLab for complex Python and Scala Spark projects

Sébastien Derivaux
Towards Data Science

JupyterLab is an awesome piece of technology for prototyping and self-documenting research. But can you use it for projects that have a big codebase?

The case for an external library

The notebook workflow was a big improvement for all data scientists around the globe. The ability to directly see the result of each step and not running over and over the same program was a huge productivity boost. Moreover, the self-documenting capacity makes it so easy to share with coworkers.

That said, there is a limit to what you can achieve in a notebook. It is best for interactive computing, but it’s no longer interactive when each cell is more than 100’s lines of code. At this point, what you need is a real IDE like VS Code or PyCharm and maybe some unit tests.

Being developed outside of your current project, a good library should be generic enough to help you, and your coworker, on a wide range of projects. See it as an investment that will pay back many times in the future.

Now, how can you push this library back into Jupyter?

The Python kernel

Suppose you want to add new functionalities to Spark objects, for instance a doSomething() method on Spark and a predict() method on a…

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

No responses yet

What are your thoughts?