How to easily install private Python packages in Google Colab

An elegant alternative to zipping projects

Yousef Nami
Towards Data Science

--

TLDR: ensure that the .json file has keys “username” → name of repository account holder, “access_token” → your GitHub access token. Please note that the package is an Early Access Release and will likely have improvements in the future. I will try to keep this up-to-date, but in case it is not please visit the repository here.

!pip install colab-dev-tools
import os
from colabtools.utils import mount_drive, install_private_library
drive_path = mount_drive()
git_access_token_json_path = os.path.join(drive_path, PATH_TO_JSON)
install_private_library(git_access_token_json_path, PROJECT_NAME)

Colab is a great tool for quickly testing Deep Learning models using the free GPU. However, one of its greatest shortcomings compared with Jupyter Notebooks is that you can’t install a custom kernel. This means that you need to install all the packages you need every single time. For most cases, this isn’t a particular issue and is only a minor inconvenience. All you need to do is to use pip install within the notebook.

However, all this fails if you are working with private packages. That is, a package that you are developing on a private repository hosted on a versioning service (e.g. GitHub). This is because pip no longer works as you need authentication, which is more difficult to achieve since GitHub’s decision to deprecate password authentication for Git Operations.

This article details a reliable and quick way to install private packages using a package I’ve developed, colab-dev-tools.

Working with Private Packages

Previously I had 2 methods for using private code on Colab:

  1. Copy paste all the code into Colab: This only works for small projects (e.g. with 1 or 2 small files). It is NOT recommended because it makes the notebook long and messy; It makes versioning really difficult; and almost any change will require a complete refactoring of the base code.
  2. Zip the package and unzip on Colab: While this works great for a single user, it becomes very difficult to maintain when working in a team. The zip file can get easily misplaced, misnamed, and versioning is almost impossible. This makes it difficult to reproduce results and debug code should something go wrong.

Since both methods are inadequate, one must consider authenticating GitHub. However, there are only two methods of doing this: ssh or using an access token.

The ssh method is great when you have a fixed device (e.g. your computer) because you don’t need to generate the key each time. However, since Colab is session based, using ssh is a pain as you need to generate the key each time. The auth-token method is better, but this requires copy-pasting because its really hard to memorise the token…

However, this is problematic because:

  1. You risk exposing the key each time you copy paste it
  2. You risk leaving it in the notebook and thus exposing it on Notebook push
  3. It is just annoying…

The solution I propose is to therefore store the access token on your personal drive, and then read it from there each time, but make sure that the reading is abstracted away by some code. This way, you never explicitly read it, thus removing risk 2). You also never need to copy paste anything, thus removing 1). Finally, the process is very streamlined so you can be sure you won’t get frustrated by 3). This is shown in the diagram below.

Solution

# install the package
!pip install colab-dev-tools
# imports
import os
from colabtools.utils import mount_drive, install_private_library
# get path to Drive root, e.g. drive/MyDrive/
drive_path = mount_drive()
# get path to the access token
git_access_token_json_path = os.path.join(drive_path, PATH_TO_JSON)
# install using pip install:
# git+https://{access_token}@github.com/{username}/{repo_name}.git
install_private_library(git_access_token_json_path, PROJECT_NAME)

Concluding Remarks

  • Make sure your drive is only accessible to you; this decreases the chances of your access token getting leaked.
  • Note that the functions in the package only abstract away the code. They don’t encrypt your access token key in any away.
  • One disadvantage this method has is that the path to your Access Token will be visible in the Notebook. This means that if an attacker does get access to your drive, they will be able to easily locate your Access Token json. As such, it might be a good idea to remove the path each time you push your notebook to Github.
  • The package has other tools that are useful when using Colab, such as measuring GPU Utilisation, sending objects to GPU, etc…

All images by author unless indicated otherwise

--

--