How to use Jupyter on a Google Cloud VM

Recipes for AI Platform Notebooks on Google Cloud Platform

Lak Lakshmanan

Published in

Towards Data Science

8 min readApr 10, 2019

Note: The recipes in this article will still work, but I recommend that you use the notebook API now. Do:

gcloud beta notebooks --help

The simplest way to launch a notebook on GCP is to go through the workflow from the GCP console. Go to AI Platform and click on Notebook Instances. You can create a new instance from the user interface:

Create a new notebook instance from the UI

Once the instance is launched, you can click on a link to open JupyterLab:

Click on the blue link to open Jupyter Lab. Once you are done working for the day, Stop the VM. Then, restart. I tend to have different notebook instances for different projects.

When the instance is launched, it has a persistent disk. That disk will hold your notebooks. You can stop and restart the VM (from the GCP web console) without losing those notebooks.

Note that you can attach a GPU to a notebook instance from the user interface:

Enjoy!

This article is a collection of a few of my “recipes” for working with Notebook instances.

A. How to script out the creation of a Notebook instance

The instance is a Compute Engine image, so if you want to script things out, customize the machine, change its firewall rule, etc. you can use Compute Engine capabilities. The notebook instance is a Deep Learning VM, which is a family of images that provides a convenient way to launch a virtual machine with/without a GPU on Google Cloud. It has Jupyter Lab already installed on it and you can access it without the need for proxies or ssh.

A1. Launch Deep Learning VM using gcloud

The simplest approach is to specify an image family (see the docs for what the image families are available). For example, you can get the latest image in the tensorflow-gpu family with a P100 GPU attached using:

IMAGE=--image-family=tf-latest-gpu
INSTANCE_NAME=dlvm
GCP_LOGIN_NAME=google-cloud-customer@gmail.com  # CHANGE THISgcloud config set compute/zone us-central1-a  # CHANGE THISgcloud compute instances create ${INSTANCE_NAME} \
      --machine-type=n1-standard-8 \
      --scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email \
      --min-cpu-platform="Intel Skylake" \
      ${IMAGE} \
      --image-project=deeplearning-platform-release \
      --boot-disk-size=100GB \
      --boot-disk-type=pd-ssd \ 
      --accelerator=type=nvidia-tesla-p100,count=1 \
      --boot-disk-device-name=${INSTANCE_NAME} \
      --maintenance-policy=TERMINATE --restart-on-failure \
      --metadata="proxy-user-mail=${GCP_LOGIN_NAME},install-nvidia-driver=True"

A2. Get the URL for Jupyter Lab

The URL to access Jupyter Lab is part of the metadata of the VM that you just launched. You can get it using:

gcloud compute instances describe ${INSTANCE_NAME} | grep dot-datalab-vm

Here’s a script that will do steps #1 and #2, waiting until the Jupyter notebook server has started:

#!/bin/bashIMAGE=--image-family=tf-latest-cpu
INSTANCE_NAME=dlvm
GCP_LOGIN_NAME=google-cloud-customer@gmail.com  # CHANGE THISgcloud config set compute/zone us-central1-a  # CHANGE THISecho "Launching $INSTANCE_NAME"
gcloud compute instances create ${INSTANCE_NAME} \
      --machine-type=n1-standard-2 \
      --scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email \
      ${IMAGE} \
      --image-project=deeplearning-platform-release \
      --boot-disk-device-name=${INSTANCE_NAME} \
      --metadata="proxy-user-mail=${GCP_LOGIN_NAME}"echo "Looking for Jupyter URL on $INSTANCE_NAME"
while true; do
   proxy=$(gcloud compute instances describe ${INSTANCE_NAME} 2> /dev/null | grep dot-datalab-vm)
   if [ -z "$proxy" ]
   then
      echo -n "."
      sleep 1
   else
      echo "done!"
      echo "$proxy"
      break
   fi
done

A3. Visit URL in web browser

Simply navigate to that URL and you’ll be in JupyterLab.

B. How to work with Git on a Notebook instance

B1. Git clone a repository interactively

Click on the last icon in the ribbon of icons in the left-hand pane and you will be able to git clone a repository. Use the one for my book:

https://github.com/GoogleCloudPlatform/data-science-on-gcp

Running a Jupyter notebook on a cloud VM without any ssh tunnels or proxies

Navigate to updates/cloudml and open flights_model.ipynb. You should be able to run through the notebook.

You can also open up a Terminal and use git clone, git checkout, git push, etc. I tend to find it easier than using the built-in Git UI. But your mileage may vary!

C. How to specify a startup script

You can specify a set of operations to run after Jupyter launches. These will be run as root.

IMAGE=--image-family=tf-latest-gpu
INSTANCE_NAME=dlvm
GCP_LOGIN_NAME=google-cloud-customer@gmail.com  # CHANGE THIS
STARTUP_SCRIPT="git clone https://github.com/GoogleCloudPlatform/data-science-on-gcp"gcloud config set compute/zone us-central1-a  # CHANGE THISgcloud compute instances create ${INSTANCE_NAME} \
      --machine-type=n1-standard-8 \
      --scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email \
      --min-cpu-platform="Intel Skylake" \
      ${IMAGE} \
      --image-project=deeplearning-platform-release \
      --boot-disk-size=100GB \
      --boot-disk-type=pd-ssd \ 
      --accelerator=type=nvidia-tesla-p100,count=1 \
      --boot-disk-device-name=${INSTANCE_NAME} \
      --maintenance-policy=TERMINATE --restart-on-failure \
      --metadata="proxy-user-mail=${GCP_LOGIN_NAME},install-nvidia-driver=True,startup-script=${STARTUP_SCRIPT}"

D. How to schedule Notebooks

D1. When moving to production, use image, not image family

In general, use the image-family approach for development (so that you are always developing with the latest of everything), but pin down to a specific image once you move things to production. The reason you want to pin down to a specific image in production is that you want to run on a version that you have actually tested your code with.

Get the list of images and find the one you were using (the latest in the image family you specified above):

gcloud compute images list \
   --project deeplearning-platform-release \
   --no-standard-images

Then, specify it when creating the Deep Learning VM (lines you might want to change are bolded):

IMAGE=--image=tf-latest-cpu-20190125b2 # CHANGE
INSTANCE_NAME=dlvm
GCP_LOGIN_NAME=google-cloud-customer@gmail.com  # CHANGE THIS
ZONE="us-central1-b"  # CHANGEgcloud compute instances create ${INSTANCE_NAME} \
      --machine-type=n1-standard-8 \
      --zone=$ZONE \
      --scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email \
      --min-cpu-platform="Intel Skylake" \
      ${IMAGE} \
      --image-project=deeplearning-platform-release \
      --boot-disk-size=100GB \
      --boot-disk-type=pd-ssd \
      --boot-disk-device-name=${INSTANCE_NAME} \
      --metadata="proxy-user-mail=${GCP_LOGIN_NAME}"

D2. Submit a notebook for scheduled execution using papermill

The key aspect here is to launch papermill with a startup script and exit the notebook VM using TERMINATE without a restart-on-failure once papermill is done. Then, delete the VM. See this blog post for more details.

# Compute Engine Instance parameters
IMAGE=--image=tf-latest-gpu-20190125b2 # CHANGE
INSTANCE_NAME=dlvm
ZONE="us-central1-b"  # CHANGE
INSTANCE_TYPE="n1-standard-4"  #CHANGE
# Notebook parameters
GCS_INPUT_NOTEBOOK="gs://my-bucket/input.ipynb"
GCS_OUTPUT_NOTEBOOK="gs://my-bucket/output.ipynb"
GCS_INPUT_PARAMS="gs://my-bucket/params.yaml" # Optional
export STARTUP_SCRIPT="https://raw.githubusercontent.com/GoogleCloudPlatform/ml-on-gcp/master/dlvm/tools/scripts/notebook_executor.sh"

gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image=$IMAGE \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator='type=nvidia-tesla-t4,count=2' \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=100GB \
        --scopes=https://www.googleapis.com/auth/cloud-platform \
        --metadata="input_notebook_path=${GCS_INPUT_NOTEBOOK},output_notebook_path=${GCS_OUTPUT_NOTEBOOK},parameters_file=${GCS_INPUT_PARAMS},startup-script-url=$LAUNCHER_SCRIPT,startup-script=${STARTUP_SCRIPT}"

gcloud --quiet compute instances delete $INSTANCE_NAME --zone $ZONE

E. How to use a TPU from Jupyter

To create a Deep Learning VM attached to a TPU, first create a Deep Learning VM and then create a TPU with the same TensorFlow version:

INSTANCE_NAME=laktpu   # CHANGE THIS
GCP_LOGIN_NAME=google-cloud-customer@gmail.com  # CHANGE THISgcloud config set compute/zone us-central1-a  # CHANGE THISTPU_NAME=$INSTANCE_NAME
gcloud compute instances create $INSTANCE_NAME \
--machine-type n1-standard-8 \
--image-project deeplearning-platform-release \
--image-family tf-1-12-cpu \
--scopes cloud-platform \
--metadata proxy-user-mail="${GCP_LOGIN_NAME}",\
startup-script="echo export TPU_NAME=$TPU_NAME > /etc/profile.d/tpu-env.sh"gcloud compute tpus create $TPU_NAME \
  --network default \
  --range 10.240.1.0 \
  --version 1.12

The only difference when creating the Deep Learning VM is that you are specifying the TPU_NAME in the startup script.

F. How to use end-user credentials in Notebooks

If you create a Deep Learning VM and you specified a GCP login name (all my examples above, except for the production one did so), then only you (and project admins) will be able to ssh into the VM.

All Jupyter notebooks will run under a service account. For the most part, this will be fine, but if you need to run operations that the service account doesn’t have permission to do, you can have code in Jupyter run as you by doing the following:

In the Launcher menu, open a Terminal
In the Terminal, type:

gcloud auth application-default login

Follow the prompts to carry out OAuth2
Restart the Jupyter kernel if necessary

Note: Do not use end-user credentials unless you started the machine in ‘single user mode’.

G. How to create a TF-nightly VM

Creating an image in the tf-latest family uses the latest stable TensorFlow version. To work with TF-nightly (e.g. this is how to get TensorFlow 2.0-alpha), use:

INSTANCE_NAME=tfnightly   # CHANGE THIS
GCP_LOGIN_NAME=google-cloud-customer@gmail.com  # CHANGE THIS
ZONE="us-west1-b" # CHANGE THIS
INSTANCE_TYPE="n1-standard-4" # CHANGE THISgcloud compute instances create ${INSTANCE_NAME} \
      --machine-type=$INSTANCE_TYPE \
      --zone=$ZONE \
      --scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/userinfo.email \
      --min-cpu-platform="Intel Skylake" \
      --image-family="tf-latest-gpu-experimental" \
      --image-project=deeplearning-platform-release \
      --boot-disk-size=100GB \
      --boot-disk-type=pd-ssd \
      --accelerator=type=nvidia-tesla-p100,count=1 \
      --boot-disk-device-name=${INSTANCE_NAME} \
      --maintenance-policy=TERMINATE --restart-on-failure \
      --metadata="proxy-user-mail=${GCP_LOGIN_NAME},install-nvidia-driver=True"

H. Troubleshooting Jupyter

Restarting Jupyter: Usually, all you need to do is to restart the kernel by clicking on the icon in the notebook menu. But once in a long while, you might completely hose the environment and want to restart Jupyter. To do that, go to the Compute Instances section of the GCP Console and click on the SSH button corresponding to your Notebooks instance. In the SSH window, type:

sudo service jupyter restart

Startup logs: If Jupyter failed to start, or you don’t get a notebook link, you might want to look at the complete logs (including startup logs). Do that using:

gcloud compute instances \
    get-serial-port-output --zone $ZONE $INSTANCE_NAME

I. Using conda

The TensorFlow images use pip, but the PyTorch images use conda. So, if you want to use conda, the PyTorch images are a better starting point.

J. How to run Notebook instance locally

If you want to develop using the Deep Learning VM container image on your local machine, you can do that using Docker:

IMAGE_NAME="gcr.io/deeplearning-platform-release/tf-latest-cpu"
docker pull "${IMAGE_NAME}"
docker run -p 127.0.0.1:8080:8080/tcp -v "${HOME}:/home" \
            "${IMAGE_NAME}"

If you have a GPU on your local machine, change the image name from tf-latest-cpu to tf-latest-cu100.

__________________________________________________

I’m interested in expanding on these recipes. Contact me if you have a suggestion on a question/answer that I should add. For your convenience, here’s a gist with all the code.