The world’s leading publication for data science, AI, and ML professionals.

Dealing with Long Running Jupyter Notebooks

It's pretty frustrating when your browser disconnects from a Jupyter Notebook!

Photo by Nathan Dumlao on Unsplash
Photo by Nathan Dumlao on Unsplash

At Saturn Cloud, we manage a data science platform that provides Jupyter notebooks, Dask clusters and ways to deploy models, dashboards and jobs. As a result, we often help customers troubleshoot their notebooks, and network disconnects is a common issue.


We’ve gotten a number of customers struggling with long running Jupyter notebooks-ones that take several hours or more to execute. Often they would come to us because these long running notebooks would at some point lose connectivity between the server and the browser, as it common with cloud services. Normally cloud services gracefully reconnect and there are no issues, but in the case of Jupyter if the connection is lost then Jupyter stops saving any output. Jupyter notebooks store all the state in the browser, meaning if there is a connectivity issue between the server running the code and the browser viewing it then the state of the notebook is lost.

If our customer’s long running code has an error in it and the connection ever cuts out, then the user has no ability to see what the output of the code was and the error messages that it created. Trying to debug these models without output is an exercise in futility. This isn’t an issue when using Jupyter locally because a computer’s connection to itself is infinitely stable, but it’s an issue when working in the cloud.

Background

Jupyter notebooks store all their state in the browser and thus require constant network connectivity. This is a well known design issue, with many implications. While having network issues won’t cause the code in a notebook to stop executing, it will affect how the output gets saved to your notebook.

The flow of a Jupyter notebook is:

  • the server pushes output to your browser.
  • your browser adds it to the notebook object (and renders it to the screen).
  • your browser saves the notebook back to the server.

In the case when the network cuts out then this flow breaks, and no output is saved. The long term solution is for Jupyter itself to be modified to handle intermittent connections, which is a pretty active area of discussion. There is no current timeline for this to be added to the open source Jupyter.

However there is a short term strategy.

Solution

We can adjust Jupyter with just a pinch of code so that it saved the output directly to a file on the server. By doing so, even if the network connectivity cuts out the server will still have the output stored to it. It’s not perfect-in an ideal world this output would still show up in the notebook itself, but it’s an improvement to have them stored somewhere instead of lost. Put this code at the top of your long-running notebook:

Execute that at the top of your notebook. TADA! Now when you’re running the notebook all output will be mirror in the data.log flat file.

How it works: In the Jupyter notebook, the normal stdout and stderr File objects are replaced with ipykernel.iostream.OutStream objects (that’s how they get displayed in the browser). This object has an echo object, which defaults to None which can propagate output. So the first set of lines sticks a Python file object in place of the echo, and all your normal stdout and stderr is now also being copied to disk. Exceptions are handled by the python logging system. In the default configuration, that’s not outputting to stdout or stderr, so the second set of lines patches it to do so, and sets the log leve.

Conclusion

With this workaround, the worst pain of having long running Jupyter notebooks is gone. That said, aat Saturn we generally recommend making use of better hardware (GPUs) or parallelization (Dask) to avoid having to wait 10 hours for your notebook to run. However, if your problem isn’t parallelizable – this is a reasonable workaround. However if you don’t know how to parallelize it but wish you did, you should talk to us! We’re really good at it!


Disclaimer: I’m the CTO of Saturn Cloud. We make it easy to connect your team with cloud resources. Want to use Jupyter and Dask? Deploy models, dashboards or jobs? Work from your laptop, or a 4 TB Jupyter instance? Get complete transparency into who is consuming what cloud resources? We do all of that, and more.

Originally published at https://saturncloud.io on July 15, 2021.


Related Articles