The world’s leading publication for data science, AI, and ML professionals.

Keeping credentials safe in Jupyter Notebooks

Jupyter Notebooks are widely used in Data Science. But how should we store credentials safely and practically when using them?

Jupyter Notebooks are widely used in Data Science for quick prototyping of models and interactive demonstrations. But how should we store passwords, certificates and keys securely when using them?

I will give a quick tour of the available solutions to help you keep your Credentials safe.

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

Why shouldn’t you "just type in" credentials?

I use Jupyter Notebooks a lot in my personal projects, and I’ve recently hit a fairly common situation that could end very badly if one hasn’t woken up on the security-conscious side of the bed. I needed to connect to the Bing API to scrape images, a paid service offered by Microsoft Azure. You basically connect with a secret authentication key that belongs to you, and you will be charged based on your usage of the API service.

If you just type the key into one of the cells and run it, even if you had the intention of deleting it once the connection is live, there is a possibility of forgetting about it and checking it into GitHub, or sending it to someone later.

But what’s wrong with updating credentials to GitHub?

The problem is that malicious bots are constantly scraping repos for any secrets accidentally pushed. Even if you fix it straight away after pushing, you’re not safe – there is a chance that is has already been scraped, or the commit history might still contain your mistake. Once your credentials are scraped, they can be used to steal your data, gain access to other services or rack up immense bills overnight on cloud computing services. The big companies are guilty of this mistake too: in 2017, hackers gained access to Uber’s private GitHub repo that stored plaintext passwords, which resulted in a 57 million user data breach.


Note from Towards Data Science‘s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.


What is special about Jupyter notebooks?

In normal scripts, the situation is a bit simpler. Of course, you also shouldn’t hardcode your credentials, but you can separate them more easily. The easiest and most convenient solution is creating a config file in the root directory of the project that stores these values locally, and making sure that you add it to the .gitignore file to avoid accidentally uploading it. For a Python project, this config file could be any format that is convenient for data storage:

  • A python script called config.py or similar
  • A JSON file
  • A YAML file
  • An INI file

I recommend this article on how to use the above config file types.

Of course this is for personal/small projects, in a big company please use a professional key vault service.

The config file approach becomes very cumbersome for Jupyter Notebooks.

The point of them is to have portable code that you can quickly set up,run, change, share and carry around. This would mean that you also need to carry the config file around and make sure to keep a folder structure where all your imports work. This is even worse if you are using a browser environment like an online Jupyter Notebook or Google Colab, where folders are annoying to get to.

Photo by Paulius Dragunas on Unsplash
Photo by Paulius Dragunas on Unsplash

Solutions for Jupyter Notebooks

There is no perfect solution, but there are a few trade-offs depending on your situation.

1. Config files

If you don’t mind storing a config files locally and carrying them around when you need, the most pain-free is using JupyterLab instead of Jupyter Notebook. This is because JupyterLab has a file browser in the side bar, so you can see and access files like your config file with ease.

Remember to add your config files to your .gitignore file to avoid checking them in!

You can create any YAML or JSON config files as explained above for non-notebook scripts, but my favourite methods are:

You can create a file called notebook.cfg for example, with this syntax:

[my_api]
auth_key: shjdkh2648979379egkfgkss

You then load the credentials as:

Credentials with ConfigParser
Credentials with ConfigParser

You can create an .env file at your project root directory, and dotenv will load these for you into environmental variables. The syntax is similar to Bash.

# Development settings
DOMAIN=example.org
ADMIN_EMAIL=admin@${DOMAIN}
ROOT_URL=${DOMAIN}/app
ADMIN_PW = kdhkr4rb344r4

You would load this as follows:

Credentials with Dotenv
Credentials with Dotenv

I like the Dotenv approach better because it is more easily shareable. If you send the notebook to someone else, all they have to do is set their own environmental variables.

2. GetPass – Interactive input

GetPass lets you type in your password into an interactive cell specifically designed for secrets. Your input is hidden when typing, doesn’t get printed out anywhere, and is not saved either. You will have to retype your secret every type you re-run the cell. This is the most useful if you have very few secrets or very simple ones you can type remember and type quickly.

Credentials with Getpass
Credentials with Getpass

3. JupyterLab’s Credential Store extension

JupyterLab allows you to type in key-value pairs into its own Credential Store, that comes from an extension. These are saved in a file called .credentialstore, which is AES-encrypted (very safe), so you can only access the credentials when logged in to JupyterLab. However, make sure to add this to your .gitignore file as well.

You can use it as:

import kernel_connector as kc
kc.get_credential("my_secret")

For more information, see this post by the extension author.

The JupyterLab Credential Store. This JupyterLab extension keeps your... | by Frank Zickert | Towards Data Science
The JupyterLab Credential Store. This JupyterLab extension keeps your… | by Frank Zickert | Towards Data Science

Note that in the new version of JupyterLab, the original extension is broken but a fixed version is available if you search for @ccorbin/credentialstore.

4. Keyring – using your system’s keychain

Keyring integrates with your OS keychain manager, so that you can access any passwords from your macOS Keychain, Windows Credential Locker or other third-party service. This makes your credentials completely decoupled from your code, there’s no config file to accidentally check in.

You would use it as:

import keyring
keyring.get_keyring()
keyring.get_password("system", "username")

5. Cloud-based key vaults

This is the most involved out of all the previous solutions, and the only one that will cost you money. Cloud services are not free, but you can get a free trial usually, and storing a couple of secrets won’t cost you much. This is also the most professional and secure method, as nothing is stored locally. I don’t really recommend this for small projects because of the overheads of authenticating, but if you are collaborating a lot, it could provide the solution you are looking for.

Azure Quickstart - Set and retrieve a secret from Key Vault using Azure portal | Microsoft Docs
Azure Quickstart – Set and retrieve a secret from Key Vault using Azure portal | Microsoft Docs

That’s All!

I hope you found a method you like, and stay safe!

Resources:


Related Articles