Intro to Neo4j: A Graph Database

Learn about the graph database Neo4j and write practice queries

Kruthi Krishnappa
Towards Data Science

--

Image by Author

What is Neo4j?

Neo4j is a graph database that stores data in the form of nodes and relationships. This is different from a more traditional relational database which stores data as a table.

The benefits of using Neo4j are that it uses a native graph storage, scalable architecture which is optimized for speed, and it is ACID compliant which helps maintain the predictability of relationship-based queries. When directly compared to MySQL, Neo4j is significantly faster when it comes to graph traversals.

Setup

To download Neo4j click here and follow the download instructions. Make sure to download the community edition.

Once downloaded, open the app and it will look like the image below. The Movie Database is preloaded and can be used to practice queries. For this tutorial click the Add Database button next to it and press create a local database. Then, change the name of the database to Practice, set any password, and press create. Finally, press start to start up the database, and then press open.

Image by Author

What is a Node?

A node is a structural type in Neo4j that represents an entity. Every node in Neo4j has an ID that is unique. Nodes also have labels and a map of properties. A node in Neo4j is represented with parentheses (). The examples below will discuss various types of nodes and how to create them.

Empty parentheses are used to create an empty node. Run the statement below to create an empty node in Neo4j.

CREATE ()

To make sure the node has been created the message shown below will pop up.

Image by Author

Parentheses with any alphanumeric value also represent an empty node. The alphanumeric values are used as a variable for the empty node. A variable is only usable within a single statement, it has no meaning in other statements.

CREATE (a)

Adding a colon after the variable allows for the addition of a node label. This can be thought of as the type for the node. In comparison to SQL, all nodes with the same label can be thought of as part of the same table. Also, all the nodes with the same label will be the same color in the graph.

CREATE(a:Person)

After this command is run will be able to see the current graph visualization. Click the top icon on the left sidebar (shown in green in the picture below). Then click *(3) button, outlined in red below, to show the whole graph. Pressing the PEOPLE button next to that will show all the nodes with the label PEOPLE.

Image by Author

After a node is defined properties can be added in the form of key-value pairs. In this example, the property of the node is name and for this node, the name is John, Doe.

CREATE (a:PEOPLE{name:”John, Doe”})

Multiple properties can be added within the curly brackets of each node. To see the properties of each node hover over it and a sidebar will pop up to the right, as seen in the image below.

CREATE (a:PEOPLE{name:”John, Doe”, age:21})
Image by Author

What is a Relationship?

A relationship is used to connect pairs of nodes. There 2 types of relationships, undirected and directed. Undirected relationships are represented with 2 dashes — . Directed nodes are represented with arrows → or ← .

Undirected relationships are used in MATCH queries, they cannot be used in a create statement. They are used to find relationships between nodes when the direction of the relationship doesn't matter.

For directed relationships, they can be as simple as using only the arrows to connect nodes.

CREATE (a:PEOPLE{name:”John, Doe”}) →(b:PEOPLE{name:”Sally, Mae”})

or

CREATE (a:PEOPLE{name:”John, Doe”})←(b:PEOPLE{name:”Sally, Mae”})

Brackets can be added to the relationships to include details. In the example below, the variable [p] is used the same way as variables in nodes.

CREATE (a:PEOPLE{name:”John, Doe”})-[p]->(b:PEOPLE{name:”Sally, Mae”})

A label can be added to relationships that are similar to labels for nodes. The relationship below shows the nodes, PEOPLE, are related through a PARENT relationship. John Doe is Sally Mae’s parent.

CREATE (a:PEOPLE{name:”John, Doe”})-[p:PARENT]->(b:PEOPLE{name:”Sally, Mae”})
Output of Query Above (Image by Author)

Properties can be added to relationships. These are equivalent to properties in nodes. In this example, it shows John Doe is Sally Mae’s father. To see the properties of a relationship click on the relationship and look at the sidebar that comes up to the right.

CREATE (a:PEOPLE{name:'John, Doe'})-[p:PARENT{type:'Father'}]->(b:PEOPLE{name:'Sally, Mae'})
Image by Author

Multiple relationships can be created with one statement.

CREATE (a:PEOPLE{name:'Jane, Doe'})-[p:PARENT{type:'Mother'}]->(b:PEOPLE{name:'Freddy, Mae'})-[g:PARENT{type:'Father'}]-> (c:PEOPLE{name: 'Jill, Smith'})

The relationships above were made with CREATE statements since the nodes don’t yet exist so both the nodes and the relations need to be created at the same time. If the nodes already exist then a MATCH statement needs to be used.

MATCH Statement

A MATCH statement is used to search for a pattern. The first use is to create a relationship between existing nodes. In this case, it needs to be paired with a WHERE clause. The nodes to be matched first need to have their type defined. In this example, the first node needs to be an Author and the second needs to be a Book. Then, getting the specific attributes from each node will have to be in the WHERE clause. In the example below the author, node needs to have the name ‘Feser, Edward’ and the book node has to have the title ‘Aristotle on Method and Metaphysics’. When choosing the attribute to define the specific node make sure it is unique to the node you want.

CREATE (a:Author{name:'Feser, Edward'}), (b:Book{title:'Aristotle on Method and Metaphysics');MATCH (a:Author),(b:Book) WHERE a.name = 'Feser, Edward' and b.title='Aristotle on Method and Metaphysics' CREATE (a)-[:EDITS]->(b);

Match statements in their simplest form can be used to retrieve a specific node.

MATCH (n:Book) WHERE n.title='Aristotle on Method and Metaphysics' and n.year=2013 RETURN n;

Match statements can also be used to return all nodes with the same label.

MATCH (a:Author) RETURN a;

To build off the query above, a set of nodes with a specific relationship can also be found with the match statement. Essentially any pattern in the graph can be found with the match statement.

Practice Queries

To practice the content discussed above, create a new database and run the statements linked here in Neo4j to create the database. The expected output is shown below each prompt.

  1. For each author, list their name and article titles (do not include chapters or books).

2. List names of authors and the number of publications (articles, chapters, and books) each has.

3. List titles and number of pages for articles that has 10 pages or less. Note: the pp property is a list. You can access list elements using brackets (e.g., pp[0])

4. List article title, journal title where the article is published, and author name(s) for articles cited twice or more

Answers

MATCH (n:Author)-[r:WRITES]->(x:Article) RETURN n.name,x.title;

2.

MATCH (n:Author)-[:WRITES]->(r) RETURN n.name, count(*) UNION MATCH (n:Author)-[:EDITS]->(r) RETURN n.name, count(*);

3.

MATCH (n:Article)-[a:IN]->(i:Issue) WHERE (a.pp[1] - a.pp[0]+1 ) <= 10 RETURN n.title, a.pp[1] - a.pp[0] +1;

4.

MATCH (a)-[:CITES]->(b) With b, count(b.title) as publication_count WHERE publication_count>=2 With b.title as bt  , publication_count MATCH (au:Author)-[:WRITES]-> (a:Article)-[:IN]->(:Issue)-[:OF]->(j:Journal) WHERE bt = a.title Return j.title, a.title, au.name,publication_count;

Graph databases are becoming widely used for fraud detection to social networking. Neo4j is steadily gaining popularity within graph databases. It stands out from other graph databases because of its high performance, its largest and most active graph community, and production applications.

For any further questions or additional practice queries regarding Neo4j please reach out. Additionally, check out the Neo4j website for more resources.

--

--