Neo4j vs GRAKN Part II: Semantics

Battle of Semantics: Who models the data better?

Published in

Towards Data Science

9 min readFeb 24, 2020

Dear readers, in this part I will take you to the semantics land. In the first part, we compared Grakn and Neo4j worlds in general to see how the paradigm difference lead to different ways of doing things. In this post Grakn and Neo4j will compete in modeling and representation power, a head to head battle to conquer the semantics world 😄

Earlier days of my career, I made knowledge engineering (I worked on a search engine Wikification component). What surprises me is, within the last 10 years knowledge graphs improved tremendously in terms of scalability, support for languages, developer/user interfaces, efficient storage; but in terms of semantic expression power … nothing ever changed that much. Still there are instances, relations, classes and us, the ones who are trying hard to charge meaning. It makes no difference you are a beginner or more advanced we all know the truth: charging any kind of semantics is difficult. That’s a whole another story by itself, I will get into details in graph learning and knowledge base supported chatbot posts (coming soon).

I won’t keep you waiting too long, let the battle begin:

1. Documentation and Community

Most important thing for any open source project is the documentation and the community support.

Neo4j offers documentation, videos tutorials, a fat Github, a community and a Stackoverflow archive. This is what I see if I search for Neo4j tutorial and GRAKN tutorial :

Grakn offers documentation, tutorials, Github repo full of examples, a community and its own Stackoverflow space as well. I see wonderful documentation and community support for both.

Neo4j: 10/10, Grakn: 10/10

2. Language Support

Grakn offers Java, Node.js and Python clients, which covers a decent part of modern development stack. Neo4j offers a much more variety in this aspect; they officially support the drivers for .NET, Java, JavaScript, Go, and Python. Ruby, PHP, Erlang and PERL drivers are provided by the community contributors. I found Grakn’s language support enough and decent for modern development; however Neo4j shines with the variety they offered. Neo4j wins this round with relative ease. I take one point from each for not supporting C++ 😄

Neo4j: 9/10, Grakn: 8/10

3. OWL Import

If one works in knowledge representation or linked data, it’s quite possible that they already own an amount of ontology resources in OWL format. Grakn does not have a direct way to import OWL files, one has to parse the file and create own Grakn ontology. More onto this issue from their own mouth: https://discuss.grakn.ai/t/migrating-owl-to-grakn/556 , though they have an XML importer. Loaders are JSON, CSV and XML.

Coming to Neo4j situation is bit different. Neo4j team seems to put lots of effort to bring more semantics to their graph approach and built neosemantics . This contrib includes comprehensive pages on RDF importing, OWL importing and inference/reasoning on those. I found this effort impressive. Neo4j also support XML, CSV and JSON imports as expected. You can also read this post for more OWL & Neo4j duo, I really enjoyed it.

Neo4j: 8/10 for the effort, Grakn: 8/10 for providing an ontology language anyway

4. Semantic Relations

OWL supports many property characteristics including reflexive, irreflexive, transitive, symmetric, asymmetric… If you want to model a friendship network you need symmetry (friendship is mutual, if Carry is friends with Barry of course Barry is friends with Carry); if you need a model on Greek mythology you need transitivity: Odysseus is great-grandson of Hermes, Telegonos is son of Odysseus and Titan Circe, then obviously Telegonos is a Hermes descendant.

Descends relation: Transitive and asymmetric. OWL style, photo taken inside Protégé

This is the place where OWL brings its full semantic power into the game, distinguishes herself from RDFS and other triplet stores or graph-like information holders. OWL has never been a bare information container, it allows developer to model meaningful real world relations.

Property characteristics supported by OWL, photo taken inside Protégé

Semantic relations is a must for full semantic power. I played around a bit to see how one can define a symmetric relationship with Grakn:

friendship isa relation-type
  relates friend;person isa entity-type,  
  plays-role friend;

Defining a transitive relation is a bit trickier, but doable😉 Grakn way of thinking is ontology and knowledge representation focused in general (even though underlying storage is a graph), if you’re familiar with OWL, Grakn feels home.

Coming to Neo4j way of thinking, Neo4j is a knowledge graph and he likes to be a graph. Then at this point, one should look at relations only as edges. I don’t see a direct way of restricting relations on creation time. Neo4j does not allow undirected edges in general, then one creates a relation with an arbitrary direction and during query time discard the edge direction like this:

Create an edge with arbitrary direction for efficiency and space reasons. Taken from the link above.

Query looks like:

//find all partner companies of Neo4j MATCH (neo)-[:PARTNER]-(partner)  which is union of MATCH (neo)-[:PARTNER]->(partner)  (edge is directed to partner)
and
MATCH (neo)<-[:PARTNER]-(partner)  (edge is directed to neo)(notice the edge directions)

Transitive relations and transitivity is a huge deal in Neo4j 😉 We look into this issue in Graph Algorithms post. For more on Neo4j relations, you can visit.

We already saw that Grakn enjoys being a hypergraph. Hypergraphs provide a relaxed node/edge concept and allow Grakn relations to relate to more than 2 roles, unlike Neo4j.

define

law-suit sub relation,
  relates court,
  relates judge,
  relates DA,
  relates defendant;person sub entity,
  has name,
  plays judge,
  plays DA,
  plays defendant;name sub attribute,
  datatype string;

You can read more on this subject on Grakn blog. This is a very important feature to model real world data, otherwise one has to go through a lot of unnecessary pain similar to OWL ways.

5. Reasoner

Reasoning in Grakn works via inference rules . How to define a rule is fairly easy, one creates the set of inference rules during the schema creation. According the documentation, rules aren’t stored in the graph directly, but the inference is made on the query time. An inference rule looks like

define rule-id sub rule,
  when LHS then RHS;

obvious to read as: if conditions on the Left Hand Side are met, then please infer the Righ Hand Side. Let’s see an example, this is how one naturally defines a being siblings relation:

define

people-with-same-parents-are-siblings sub rule,
when {
    (mother: $m, $x) isa parentship;
    (mother: $m, $y) isa parentship;
    (father: $f, $x) isa parentship;
    (father: $f, $y) isa parentship;
    $x != $y;
}, then {
    ($x, $y) isa siblings;
};

RHS can describe an inferred relation, or some property of an entity. For instance in this example of how to infer continent of a city, the RHS is about inference of an entity-attribute relation i.e a has:

city-in-continent sub inference-rule,
when { 
(contains-city: $country1, in-country: $city1) isa has-city;           $country1 has continent $continent1;
}then 
{ 
$city1 has inf-continent $continent1; 
};

As you see, one can infer entities, attributes and relations aka nodes, properties and edges. I really liked the Grakn way, Graql’s semantical beauty shines at this point. Efficiency is also very good to my observation. If you want to read more, you can jump to this post.

As you see reasoning is a core part of Graql, where unfortunately Cypher does not come with build-in reasoning. Still Neo4j does not drop the ball and Neosemantics comes into play again for more semantics. One can infer about nodes and relations of imported ontologies within the WHERE clause, as described in the documentation:

CALL semantics.inference.nodesLabelled('Crystallography',  
     { catNameProp: "dbLabel", catLabel: "LCSHTopic", subCatRel:     "NARROWER_THAN" }) 
YIELD node
RETURN node.identifier as id, node.title as title, labels(node)   as categories

Unfortunately I don’t see an easy and direct way of reasoning over a Neo4j graph. That is not a surprise indeed, because Neo4j likes to be a graph. Here actually a comparison is not applicable at all.

Neo4j: 4/10 for the effort; Grakn: 10/10 for perfect reasoning. Grakn is the winner here without hesitation.

6. Semantic Power

As many times I wrote Neo4j likes to be a graph; their official name is a Graph Database. On the other hand Grakn likes to be a Knowledge Graph and more knowledge-oriented; still they do not sacrifice from semantics by providing an ontology language to create and query the graph. I really liked Graql being semantics oriented, it hides the underlying graph and make it all seem like only as ontology writing and reasoning. This is pure beauty. If you are on the semantics side like me, you will love this sort of smoke and mirrors.

Cypher looks like a bare database query (although it carry lot of semantics and offer path queries). It feels like writing a SQL query for your old logging database, no excitement though it is perfectly correct and efficient.

In modeling relations category, Grakn wins with supporting n-ary relations. I am honestly surprised why nobody else implemented n-ary relations before and saved us from huge amount of pain.

Another huge plus is, Grakn offers logical integrity, which NoSQL and graph databases lack in general, so does Neo4j. At the same time, it can scale horizontally like NoSQL, which Neo4j offers as well. I am very impressed by horizontal scaling capabilities of both platforms without compromising from semantics; but Grakn wins my heart here with logical integrity capability.

I will not make it long, winner here is Grakn. I found Grakn semantically more expressive with Graql in general, ability to express subclass and subrelations in an organic way, power of defining abstract entities, attributes and relations, allowing n-ary relations, having a build-in reasoner… makes Grakn the winner here.

Neo4j: 7/10; Grakn:9/10

8. Text support

This section is added according to my taste because I like text, you know 😉 (sorry geocoding/spatial people, this is my blog 😅)

Grakn match supports regex and contains . A typical query might look like:

match $phone-number contains "+49"; get;match $x like "eat(ery|ing)"; get;

I’m a bit unhappy that very handy methods startswith and endswith are not implemented at all (don’t say write a regex with ^ , all beauty lies in the name of the methods).

On the other hand, Neo4j’s WHERE supports regex , contains , startswith and endswith . One can query like this:

MATCH (n)
WHERE n.name STARTS WITH 'Mar'
RETURN n.name, n.age

Though methods are really similar, I really like the emphasis put on the string methods in the documentation. Neo4j conquers my heart here.

Neo4j: 10/10, Grakn: 9/10 ; I may not be very objective in this evaluation due to the above reasons 😄

The Winner is …

Grakn.

Neo4j and Cypher competed hard, but hard not enough semantics to bring Grakn and Graql down.

What is next

Dear readers, we reached the end of the competition; I hope you enjoyed the bloody combat. What is next is the comparison of graph algorithms and guess who is a killer opponent 😄 We will explore the graph algorithms via building 2 recommendation systems, one in Grakn and one in Neo4j. Meanwhile you can visit my other articles or you can also visit me on https://duygua.github.io. Until then stay happy and tuned 👋