Neo4j vs GRAKN Part I: Basics

An exhaustive comparison between two most popular knowledge bases

Duygu ALTINOK
Towards Data Science

--

Dear readers, in this series of articles I compared two popular knowledge bases: Neo4j and Grakn. I decided to write this comparison long time ago upon your requests, however kismet is for now 😊

This is a detailed comparison in 3 parts: first part is devoted to technical details, the second part dives into details of semantics and modeling. The introduction parts give quick information about how Neo4j and Grakn works, as well as some details of what those bring us new. If you already know about the first parts, you can directly dive into the semantic power part. The third part is devoted to comparison of graph algorithms, core of recommender systems and social networks. The series will continue with graph learning, chatbot NLU and more semantics with both platforms.

In this article, I will briefly compare how Neo4j way of doing is different from Grakn way of doing things. As you will follow the examples of how to create a schema, how to insert, delete and query; you will notice the paradigm difference. This part is not a battle, rather a comparison.

I know you cannot wait for the content, let the comparison begin … here are some highlights:

How Grakn Works

Grakn is the knowledge graph, Graql is the query language notes Grakn homepage. Grakn is a knowledge graph, completely true; but the language Graql is data oriented and ontology like. Graql is declarative, one both defines data and manipulates data. We will see more on Graql in the next sections.

Grakn: focused on knowledge representation, not much giving away being a graph

In my opinion Grakn is a knowledge base, Graql is an data oriented query language; all these built onto a graph data structure but you never feel the graph is there. This is how it is described in their website:

When writing Graql queries, we simply describe what information we would like to retrieve, rather than how should it be obtained. Once we specify the target information to retrieve, the Graql query  processor will take care of finding an optimal way to retrieve it.      

When I first met Grakn, I thought this is a knowledge modeler and still I feel the same. Best explanation of what Grakn indeed is comes from themselves:

Grakn is a database in the form of a knowledge graph, that uses an  intuitive ontology to model extremely complex datasets. It stores data  in a way that allows machines to understand the meaning of information in the complete context of their relationships. Consequently, Grakn allows computers to process complex information more intelligently with less human interventionGraql is a declarative, knowledge-oriented graph query language that uses machine reasoning for retrieving explicitly stored and implicitly derived knowledge from  Grakn.

As I said before, even though themselves said Grakn is a database , I still raise my objections and insist that Grakn is a knowledge base 😄

How Neo4j Works

Neo4j is a graph-looking knowledge graph 😄 Though in their home page I see the connected data once, one has to go through their documentation to see the semantics they actually bring:

Neo4j: Looks like only a graph database, but a knowledge graph in reality.

Although they wrote only a graph database in their front page, I highly disagree. Neo4j is a knowledge graph definitely. One can see relations, classes, instances, properties i.e. the schema definition. This is semantics, that’s it. Neo4j definitely is not just a graph database, one can model the knowledge.

How Grakn Wins over OWL

OK if we already can write down some OWL then, why should we use Grakn instead, one might think. Grakn explained this issue in their post in detail. For me, first plus is definitely Graql, easy to read and write. OWL is usually created by Protégé or other similar frameworks, the resulting XML is basically unreadable. Below find the same descends relationship in OWL-XML and Graql:

descends relation in Graql and OWL-XML.

From the view of production, Grakn is

  • scalable
  • highly efficient
  • has Python, Java, Node.js clients
  • decently wrapped up as the whole framework

… i.e. production ready. From the view of development

  • queries are highly optimized
  • underlying data structure is flexible for semantic modeling
  • graph algorithms are also provided.

From the view of semantics, Grakn has more power; the data model

  • is easy to update/add onto
  • allows abstract types (we will come to this)
  • has a huge plus over OWL, it guarantees logical integrity. OWL has an open world assumption, whereas Grakn offers a chic combination of open world and closed world assumptions. You can read more about this at Logical Integrity section.
  • allows more abstraction details in general. For instance the underlying hypergraph allows n-ary relations. It’s a bit painful to model n-ary relations in OWL, there is a whole description here.

What Neo4j Brings

Honestly I don’t know where to start. Neo4j shines in NoSQL world with providing connected data, shines among graph databases with his superior back end and high performance. Unlike many other graph databases, Neo4j offers

  • fast reads and writes
  • high availability
  • horizontal scaling. Horizontal scaling is provided through 2 types of clustering: high availability clustering and causal clustering.
  • cache sharding
  • multiclustering
  • no joins. Data is connected via edges, complex joins is not necessary to retrieve connected/related data
  • granular security.

Personally I fell from my chair while reading Neo4j’s unmatched scalability skills. I highly recommend visiting the corresponding page. Warning: you may fall in love as well ❤️

From data perspective Neo4j offers

  • temporal data support
  • 3D spatial data support
  • real-time analytics .
From top left: Finding valuable outliers, represent connectivity in different ways, querying and visualizing spatial data. All taken from Neo4j home page.

If you want to model a social network, build a recommendation system or model connected data for any other task; you want to manipulate temporal or spatial data, you want a scalable, high performance, secure application the choice is Neo4j. No more words here.

Getting Started

Getting started with Grakn is easy: first one installs the Grakn, then make a grakn server start . After, one either can work with Grakn console or Grakn workbase .

Same applies to Neo4j, download and install is provided in a very professional way. If you need, Docker configuration and framework configuration manuals are there as well. Afterwards, you can download Neo4j browser and start playing or you can discover the back end more. You can also experiment with Neo4j without downloading via their Sandbox which I really really liked. You can play around without any download hustle.

Neo4j and Grakn download pages

I must say I totally fell in love with Neo4j documentation as a side note, level of effort and professionalism is huge.

Development Environment and Visualization

Both Grakn and Neo4j offers IDEs for easy use and visualization.

Grakn workbase offers 2 functionalities: visualization of your knowledge graph and an easy way to interact with your schema. You can perform match-get queries and make path queries within the workbase.

Queries with Grakn workbase, photo taken from their Github

Neo4j offers their browser for 2 purposes as well: easy interaction and visualization. One can also explore patterns, clusters, and traversals in their graph.

Querying inside the Neo4j browser, taken from their documentation

Moreover, Neo4j offer much more for visualization Neo4j Bloom and other developer tools for visualization, mainly JS integration. With Neo4j Bloom one can discover the clusters, patterns and more.

Clustering and link exploration with Bloom, images taken from their website and Medium developer blog

It does not end here, Neo4j has even more visualization tools for spatial data and 3D. Neo4j is a master of spatial data in general, but visualization tools carry Neo4j to a different level:

Neo4j visualization for maps data, taken from their Medium developer blog

Both platforms offer great visualization, Neo4j offers more due to being older 😄

We covered basics of the both platforms, now we can move onto development details.

Underlying Data Structure

For short Grakn is a hypergraph, Neo4j is a directed graph. If you are further interested how Neo4j stores his nodes, relations and attributes you can visit their developer manual or Stack overflow questions.

Data Modeling

Data modeling is the way we know from good old OWL.

Neo4j works with knowledge graph notions: nodes (instance), labels (class), relationships, relationship types (attributes) and properties (data attributes).

Modeling knowledge with graph notions, image taken from Neo4j documentation

Grakn style knowledge modeling is closer to ontology ways, declarative and more semantics oriented. Notions are:

  • entities (classes)
  • instances
  • attributes
  • roles
  • rules
  • type hierarchies
  • abstract types
Grakn modeling, entities-relations, subentities, ternary relations. Taken from their front page.

If you are more onto the ontology side, Grakn way of thinking is really similar. Modeling easy, providing semantic power and efficient.

Query Language

Graql is declarative and more data oriented. Neo4j’s Cypher is also declarative, however flavor is SQL. For me, Graql feels like ontology, Cypher feels like database query. Compare the following simple queries in both languages:

//CypherMATCH (p:Person { name: "Duygu" })
RETURN p
//Graql
match
$p isa person, has name "Duygu";
get $p;

Personally I find Graql more semantics friendly.

Creating a Schema

Creating a schema in Grakn is easy, remember Graql is declarative. Basically you open a .gql file and start creating your schema, that’s it 😄. Here is an example from their front page:

define

person sub entity,
has name,
plays employee;

company sub entity,
has name,
plays employer;

employment sub relation,
relates employee,
relates employer;

name sub attribute,
datatype string;

Neo4j way of creation of entities and instances is CREATE . Once those ones are created, then one can make a MATCH query to get the corresponding nodes and create a relationship between them. One creates the nodes, edges and their attributes:

CREATE (d:Person { name:"Duygu"})
CREATE (g:Company {name: "German Autolabs"})
MATCH (a:Person),(b:Company)
WHERE a.name = 'Duygu' AND b.name = 'German Autolabs'
CREATE (a)-[r:Employed { since: '2018' }]->(b)
RETURN type(r), r.name

Querying

Graql is declarative, querying is in ontology fashion again. Querying is done by match clauses. Here are some simple queries about customers of a bank:

match $p isa customer; get;match $p isa customer, has first-name "Rebecca"; get;match $p isa customer, has full-name $fn; { $fn contains "Rebecca"; } or { $fn contains "Michell"; }; get;

So far so good. Now, we can get some insights from our customers. This is a query for average debt of the Mastercard owner customers, who are younger than 25 :

match
$person isa customer, has age < 25;
$card isa credit-card, has type "Mastercard";
(customer: $person, credit-card: $card) isa contract, has debt $debt;
get $debt; mean $debt;

Looks clean. How about Neo4j? Cypher querying is done by MATCH clause as well, but with a completely different syntax; rather a database match flavor with a WHERE. When there is a WHERE , you can play some string method games as well😄:

MATCH (customer:Customer)
RETURN customer
MATCH (customer:Customer)
WHERE customer.first_name = 'Rebecca'
RETURN customer
//or equivalently with a bit syntactic sugarMATCH (customer:Customer {first_name: "Rebecca"})
RETURN customer
//MATCH (customer:Customer)
WHERE p.first_name STARTS WITH 'Steph'
RETURN p

Coming to the relations, one needs to keep an eye on the edge direction via arrows. Here is a query for the bank customers who drives an Audi; notice the DRIVES relation is directed from customer to their car:

MATCH (car:Car {brand: "Audi"})<-[:DRIVES]-(customers)
RETURN customers.first_name

Aggregation is similar to SQL as well, here is the same query for the young Mastercard user customers’ debt:

MATCH (customer:Customer)-[:OWNS]->(card:CreditCard {type: "Mastercard"})
WHERE customer.age < 25
RETURN AVG(card.debt)

SQL syntax applies to Cypher queries in general, if you like to write SQL then definitely you feel at home with Cypher. This is a query for finding the node with most properties:

MATCH (n)
RETURN labels(n), keys(n), size(keys(n)), count(*)
ORDER BY size(keys(n)) DESC

Coming back to semantics, one can query how to entities/instances related:

MATCH (:Person { name: "Oliver Stone" })-[r]->(movie)
RETURN type(r) //DIRECTED

What about “joins” i.e. queries about related nodes? One usually handles such queries just as in Grakn counterpart, just a bit more instances and more relation arrows:

//Name of the movies that Charlee Sheen acted and their directorsMATCH (charlie { name: 'Charlie Sheen' })-[:ACTED_IN]->(movie)<-[:DIRECTED]-(director)
RETURN movie.title, director.name

Time to time I emphasized Neo4j being a graphie graph, now let’s see it on action … One can make path queries with Cypher, or usual semantic queries with path restrictions .

Let’s say you have a social network and you would like to find all persons who is related to Alicia in a distance of 2 and who follows who in which direction is not important:

MATCH (p1:Person)-[:FOLLOWS*1..2]-(p2:Person)
WHERE p1.name = "Alicia"
RETURN p1, p2

Of course shortest path is a classic in social networks, you might want to know how close Alicia and Amerie are socially linked:

MATCH p = shortestPath((p1:Person)-[*]-(p2:Person))
WHERE p1.name = "Alicia" AND p2.name = 'Amerie'
RETURN p

This way we look for the shortest path via any relation type. If we want we can replace [*] with [:FOLLOWS] to specify we want this relation. (There might be other relation types attending the same college, living in the same city…)

As you see Cypher offers much more than meets the eye. It is true that the syntax looks like SQL … but story is very different, graph concepts and semantic concepts meet to fuel up Neo4j.

Inference

Semantic reasoners exist since RDF times. Inferring new relations and facts from schema knowledge and already existing data is easy for human brain, but not so straightforward for the knowledge base systems. Are we allowing open-world assumptions(if you do not know sth for sure it does not mean it is wrong, you just do not know); does world consist of what we already know (what happens then an unseen entity comes to our closed-world), how much should we infer, should we allow long paths … Hermit was a popular choice for OWL (I used it as well) and it can be used as a plugin in Protégé.

Inference in Neo4j does not have built-in tool. Here I will introduce how Grakn tackles it.

Inference in Grakn is handled via rules . Here is how they described what a rule is:

Grakn is capable of reasoning over data via pre-defined rules. Graql  rules look for a given pattern in the dataset and when found, create the given queryable relation.Graql reasoning is performed at query time and is guaranteed to be complete.

Let’s see an example of a sibling rule, if two person has same mother and father; then one can deduce they are siblings. How to express this inference in Grakn is as follows:

Sibling inference rule, taken from Grakn documentation

Grakn rule creation is intuitive: when some conditions are met, then we should infer the following fact. I found the syntax refreshing and sweet. Especially in drug discovery and any other sort of discovery tasks one needs reasoning %100. If I was involved in discovery type tasks, I would use Grakn only for this reason.

Logical Integrity

Once you create your schema and can infer new relations, after that you would like your schema to stay intact and do not allow incorrect data to get into your model. For instance, you would not want to allow a marriage relation between a person and a carpet (though person + tree is legal in some parts of the world😄).

Though Neo4j has constraints on their documentation, those are just database conventions for data validation and null checks:

Neo4j helps enforce data integrity with the use of constraints.  Constraints can be applied to either nodes or relationships. Unique node  property constraints can be created, as well as node and relationship  property existence constraints.

Graql ensures the logical integrity via roles , each role comes from a class as you see in above examples. Again, mandatory for discovery type tasks.

Scalability

Both are super scalable. I spoke a lot about Neo4j scalability above but kept Grakn way of scalability as a secret 😄 Grakn leverages Cassandra under the hood, hence Grakn is enjoys being strongly consistent, scalable and fault tolerant.

Graph Algorithms

is subject of the next-next post. You will have to wait 2 more posts 😉

Grakn World vs Neo4j World

… looks very different, but very similar at the same time. Grakn way of things are more knowledge oriented and Neo4j feels the graph taste a bit more. Both frameworks are fascinating, then only one more question left: who is better ? Upcoming next, the great battle of semantics between Grakn and Neo4j 😉

Dear readers, we reached the end of this exhausting article but I won’t wave goodbye yet 😄 Please continue reading with the Part II and meet me for exploring the fascinating world of semantics. Until then take care and hack happily. 👋

--

--

Senior NLP Engineer from Berlin. Deepgram member, spaCy contributor. Enjoys quality code, in love with grep. Youtube: https://www.youtube.com/c/NLPwithDuygu