Get going with Neo4j and Jupyter Lab through Docker

Why work with one at a time when you can work with both?

CJ Sullivan
Towards Data Science

--

Photo by Antoine Petitteville on Unsplash

Docker has provided data scientists (and, really, everyone in software) with the ability to share code in a reproducible fashion. I remember when I was first exposed to it NaN years ago. I had been someone who used old-school virtual desktops that were clunky and took up a lot of memory just by virtue of their architecture. The day I first learned about Docker I was so excited that I was awake most of the night thinking about it! Like a kid at Christmas, I collected a whole bunch of different containers both because I could and because it made me giddy thinking about all of these great new environments I could develop (play) in.

So that brings me to today. Sure, I can do things in a virtual environment, but what if I screw up something somewhat major? (Think: messing around with CUDA drivers and the like.) I am able to screw up my local system as well as the next person. Doing all of my development inside a container prevents me from doing any permanent damage. (Because how long does it take for you to get your linux environment set up just right like you like it???) Plus, I can easily share that environment with other people and not have to worry about system and package differences. It just works. And this is great for data scientists because we tend to have a lot of software dependencies on things like Python packages.

This blog post will be the first of a series on doing data science in Neo4j. Doing data science with graph data is really fun, but I will use future blog posts to get into the nitty gritty of that. To get started, it really helps that anyone who might follow along with those future posts be able to have that same environment as me. To that end, you can find the code for the Docker container in this GitHub repo.

Code walk through

In my role as a graph data scientist, I need to be able to interface with both the Neo4j database as well as a host of common data science tools and packages, like Jupyter notebooks. Ideally I would like Jupyter to talk with Neo4j without too much fuss. This is where docker-compose comes in.

Let’s take a look at some of the files in the repo that we will use to do this.

First is the obvious requirements.txt file that spells out any additional requirements beyond what is in the base containers:

Note that I have two different python packages for interfacing with the Neo4j database that we will create. The first is the official driver and the second is a community-written driver. Pick your favorite.

Next, I have the Dockerfile , which is just the standard one put out by our friends at Jupyter. It has a ton of good, commondata science packages in it, but you can always inspect the packages installed via pip3 list. Anything that you need that is not in that list should be put into requirements.txt .

Lastly, we need to get a Neo4j instance and set up some sort of network between the Neo4j and Jupyter portions of the container. So here we are going to use docker-compose :

A few things about this setup. First, I am using Neo4j 4.2.3, but obviously version numbers change, so you should consider using the latest version (although I note that :latest does not always get the absolute latest version). Next, we are opening and forwarding a few ports. 7474 is used by the Neo4j browser while 7687 is the bolt connector, which will be used for the python connection to the database. In line 15, we are creating the users neo4j with the password 1234 (be more creative than this!) Using what is in line 16, we are importing two important libraries into Neo4j, namely Awesome Procedures on Cypher (APOC) and the Graph Data Science (GDS) Library. Both of these are amazingly useful as we solve data science problems. Next, note that lines 18–20 are commented out. GDS is memory intensive. You will want to add memory beyond the default configuration. The exact values of these depend on your local system, so adjust them accordingly and then uncomment the lines. Finally, we see that we are running the Jupyter container on port 8888. Both containers are linked via the neo_net network so they can talk to each other.

Running the container

As with any Docker container, the first step is to build it. To do this, we issue from the command line the usual command:

docker-compose build

This will go through and assemble the whole thing. Next, we run it via

docker-compose up

This fires up both the Neo4j and Jupyter portions of the container. You will see many things scroll by, but one of them will be the link to Jupyter containing the token you will need to open it. Clicking on that link should open Jupyter in your browser. Next, we can navigate to localhost:7474 to get to the Neo4j browser.

Now we should have both Jupyter and Neo4j able to communicate with each other. To test that, you can run the notebook in notebooks/test_db_connection.ipynb . If you get no error running it, you are good to go! You can now interact with Neo4j from within a Jupyter notebook as well as directly from the Neo4j web browser.

And, of course, once you are done you issue the command

docker-compose down

to cleanly shut down the containers. The great thing though is that because we have linked a few volumes like notebooks/ , our work is saved on our local computer and can be reused any time we stand up the container again.

Stay tuned for the next blog post walking through the data science journey with graphs! Thank you for reading!

--

--