The world’s leading publication for data science, AI, and ML professionals.

Jupyter Notebooks I: Getting Started with Jupyter Notebooks

In the previous tutorial, I showed you how to create a directory and activate a virtual environment for your Python data science project…

Using Jupyter Notebook in Virtual Environments for Python Data Science Projects

Learn how to install a kernelspec to access your Python data science virtual environments within Jupyter Notebook

Image by Reimund Bertrams from Pixabay
Image by Reimund Bertrams from Pixabay

In the Creating Virtual Environments for Python Data Science Projects, I explained how to install Pyenv and Virtualenv to manage your Python versions and virtual environments on mac OS Big Sur.

With that scaffolding in place, the next step will be to create a project directory, activate a new environment, and install some popular Data Science packages, like Pandas, Matplotlib, Seaborn, and Jupyter. Then, we will install a kernelspec so that these libraries will be available to use in Jupyter Notebook.

I. Create and activate a virtual environment with Pyenv and Virtualenv

First, we need to install the desired version of Python. Even though Python is already installed on your computer in multiple locations, Pyenv needs its own copy of the version your project will use. Unless you have a reason to use an older version, the newest stable release is a good place to start.

% pyenv install 3.9.1

In the future, if you would like to use Python 3.9.1 for another environment, there is no need to reinstall it. You will only need to install each version of Python one time to use it.

Now that Pyenv has a copy of the Python version that you wish to use, you can create a virtual environment and assign it to that version.

% pyenv virtualenv 3.9.1 project_env

The syntax is as follows:

% pyenv virtualenv [python version] [environment name]

Next, your project will need a directory. You can create one by entering:

% mkdir project_dir

Enter your directory:

% cd project_dir

Then, assign the virtual environment as the local environment for that directory. Now the environment will open whenever you enter your project directory:

% pyenv local project_env

If you wish to activate this environment somewhere else, you can enter that directory in the terminal and use:

% pyenv activate project_env

II. Install Pandas, Jupyter, Matplotlib, Seaborn and other popular data science packages with Pip

The next task is to install popular data science packages with Pip into our virtual environment. This step is important because only the specific package versions that are installed to the environment will run in that environment. Every time you create a new environment, you will have to install all of the packages that you need, even if they have already been installed to another environment.

Right now we are installing:

  • Pandas – so we can manipulate data
  • Jupyter, Notebook and Ipykernel – so that we can use Jupyter Notebooks to write, execute and annotate code
  • Matplotlib and Seaborn for data visualizations.

Inside of your project directory, you can start the installation with Pip:

% pip install pandas Jupyter Notebook ipykernel matplotlib seaborn

Now, you will be able to use these packages within this environment.

III. Create a kernelspec to start using virtual environments with Jupyter notebooks

Jupyter notebooks are an interactive environment where you can write and execute Python code, as well as add markdown cells to explain your methods and code. We are going to use Ipykernel to link our virtual environment to Jupyter so that we can easily use that environment in a notebook.

A kernelspec is a is a JSON file within ~/Library/Jupyter /kernels directory that was installed when you installed Jupyter. In the kernels directory is a folder for each virtual environment that you have installed. Inside each of those folders is kernel.json.

Inside ~/Library/Jupyter /kernels folder
Inside ~/Library/Jupyter /kernels folder

If you open up it up, it looks like this:

A kernelspec JSON file for a Python virtual environment created by Ipykernel
A kernelspec JSON file for a Python virtual environment created by Ipykernel

To create a kernelspec for your virtual environment, enter the following in your project folder and make sure the environment is activated when you do it:

% python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

Lets break down the syntax:

  • python: indicates the command should be executed by Python
  • -m : an option that means run the module as a program
  • ipykernel: the module to run
  • install: instructs ipykernel
  • β€” user: indicates that it should be installed in the directory of the current user
  • β€” name my_project_env: assigns "my_project_env" as the name of the kernelspec directory
  • β€” display-name "Python (myenv)": assigns the name that will be displayed in Jupyter to represent the environment.

If this plan worked, the terminal should display something like this:

Installed kernelspec my_project_env in /Users/myusername/Library/Jupyter/kernels/my_project_env

IV. Launching Jupyter Notebook

With your kernelspec installed, we can open Jupyter Notebook and use our new environment.

% jupyter notebook

This launches Jupyter Notebook. Your project directory will be displayed in your browser. In the top right corner, you’ll see a drop down menu that says ‘New’. Click the button and select the drop down to see a list of virtual environments with an installed kernelspec. Click on the display name for your new environment to open a new notebook.

Jupyter Notebook directory
Jupyter Notebook directory

Let’s test the notebook out and make sure our preliminary package installations were successful by trying to import them into our notebook.

Type the import statement into the first cell to import any packages that are available in the selected environment for use in the notebook.

import pandas

If the notebook did not output an error message, the installation has been a success! Now you can start coding your project. If you did get an error message, I suggest carefully reading the first and second tutorials in this series, just to make sure that you did not miss a step.

V. What did we do?

  1. Created a new virtual environment that activates when we open our project directory with Pyenv and Virtualenv.
  2. Installed popular data science packages Pandas, Jupyter, Notebook, Ipykernel, Matplotlib, and Seaborn.
  3. Generated a kernelspec for Ipykernel to link our environment to our notebooks.
  4. Used Terminal to open a Jupyter Notebook directory in our browser.
  5. Created a new notebook that will use our new virtual environment.
  6. Learned the import statement to import Pandas and check the installation.

πŸ‘©πŸ» β€πŸ’» About the Author

Hi. I’m Christine. A relentlessly curious data scientist with a degree in linguistics and an interest in natural language processing.

πŸ’‘ To learn more about my work…

πŸ“… Data in a Day πŸ“° christineegan42.medium.com/ πŸ“« How to reach me: πŸ“§ [email protected]


Related Articles