15 June 2016
15 June 2016,
 Off

Author: Luigi Dell’Aquila  – @ldellaquila

In one of my previous blog posts I wrote about how SQL can be used as a foundation for a richer query language, more powerful and easier to use to query a complex data model (the document/graph multi-model).

I described the basic mechanisms and logics behind SQL (projections, filtering and so on),  the deep traversal and the UNWINDing, but there is another way of querying structured data: pattern matching

OrientDB v. 2.2 includes the new MATCH statement, specifically designed to execute pattern matching queries on graphs and to mix them with the existing SQL syntax, to have the best of both worlds together.

Let’s start with some examples, the domain is the well-known FoaF (friend of a friend) graph:

Alice and Bob are friends
Alice -FriendOf-> Bob

 

You can consider a query that returns all the couples of friends:

(pseudo-code)
A -FriendOf-> B

 

This is a pattern, a general representation of nodes and relationships that can occur in your graph.

You can write such query in SQL, with dot notation (in OrientDB) or with JOINs (in relational databases), but the pattern is not explicit:

SELECT 
name as name, out("FriendOf").name as friendName 
from Person 
UNWIND friendName

 

the query is someway procedural, it iterates over all the people and then traverses the FriendOf relationship from left to right.

In OrientDB v 2.2 you can express the same query with a MATCH statement:

MATCH 
   {class:Person, as:A} -FriendOf-> {as:B}
RETURN A.name as name, B.name as friendName

 

In this case the pattern is explicit! You can clearly see (even graphically) two nodes and an edge in the middle. By the way, the pattern also makes a detail explicit: the node B is not declared as a Person, so maybe the initial query was wrong and your intention was the following:

MATCH 
   {class:Person, as:A} -FriendOf-> {class:Person, as:B}
RETURN A.name as name, B.name as friendName

 

Now try to think about a more complex scenario, find all couple of friends that have a common friend who has a dog and where one of the two people works at Google:

  • A and B are friends
  • A and B have a common friend C
  • C has a dog
  • B works at Google

pattern-matching-1

Now try to express it in plain SQL, you’ll see that it’s not easy ;-)

The pattern matching version of this query is as follows:

MATCH 
   {class:Person, as:A} -FriendOf- {class:Person, as:B},     /* A and B are friends */
   {as:A} -FriendOf- {as:C} -FriendOf- {as:B},               /* C is a friend of both A ad B */
   {as:C} -HasA-> {class:Dog},                               /* A has a dog */
   {as:B} -WorksAt-> {class:Company, where:(name="Google")}, /* B works at Google */
RETURN A.name as name, B.name as friendName

 

That’s easy and natural, don’t you think?

The Arrow Notation

In a pattern, you can declare edges with a very natural arrow notation:

{}  -EdgeClass-> {}    /* same as .out("EdgeClass")  */

{}

 

With arrow notation you can only express out(), in(), and both() relationships

The Good Old Edge Notation

You can still use the old out()/in() edge notation in a pattern matching query, eg. the following queries are equivalent

MATCH 
   {class:Person, as:A} -FriendOf-> {class:Person, as:B}
RETURN A.name as name, B.name as friendName
MATCH 
   {class:Person, as:A}.out("FriendOf"){class:Person, as:B}
RETURN A.name as name, B.name as friendName

 

With the old edge notation, together with out/in/both, you can also use outE(), inE(), bothE(), outV(), inV(), bothV() operators, to traverse from vertices to edges and from edges to vertices. This is the only way you have to retrieve properties from edges.

MATCH 
   {class:Person, as:A}
   .outE("FriendOf"){as:friendshipEdge}
   .inV(){class:Person, as:B}
RETURN A.name as name, B.name as friendName, friendshipEdge.since as friendsSince

 

Pattern nodes

Pattern nodes can represent vertices (see the first examples) or edges (see “friendshipEdge” in last example).

A pattern node can have:

  • an alias (as:<alias>), that can be used in the RETURN part
  • a node type (class:<className>), that declares the type of vertex/edge that can match that node
  • a where condition (where:(foo=’bar’ and baz < 2)), allows you to filter patterns based on conditions on the nodes
  • a while condition (while:($depth < 3)), allows you to traverse an edge at a depth greater than one (see the official docs for details)

If you have multiple occurrences of the same node in the pattern (eg. same alias repeated in the pattern), the WHERE conditions of these nodes will be merged. Eg.

MATCH 
   {class:Person, as:A, where:(name='John')} -FriendOf-> {as:B},
   {class:Person, as:A, where:(surname='Smith')} -FriendOf-> {as:B}
RETURN A.name as name, B.name as friendName

 

is equivalent to

MATCH 
   {class:Person, as:A, where:(name='John' AND surname='Smith')} -FriendOf-> {as:B}
RETURN A.name as name, B.name as friendName

 

Complex patterns

In some cases a pattern can be represented with a single expression, but in many cases it is convenient (sometimes necessary) to represent a pattern as a set of sub-patterns.

Single sub-patterns have to be separated by commas, nodes with the same alias will be merged when the query is compiled, so the final pattern will be a single meta-graph structure.

If you define disjoint sub-patterns in the same pattern matching query, the result will be the cartesian product of the results of the sub-patterns

Return types

In the RETURN section you can declare the information you want to be returned by the query. You can use

  • aliases and expressions, eg. RETURN A.name+” “+A.surname
  • $matches, to return each pattern that matches the query (only nodes that have an explicit alias)
  • $paths, to return each pattern that matches the query (all the nodes, also those that do not have an alias)
  • $elements, to return each a flat result-set of all the nodes are in at least one matched pattern (only nodes that have an explicit alias)
  • $elements, to return each a flat result-set of all the nodes are in at least one matched pattern (all the nodes, also those that do not have an alias)

Mixing Pattern Matching and SQL

OrientDB considers MATCH statements as normal query statements, that means that you can use them as sub-queries of normal SQL statements, eg. the following is a valid query:

SELECT a.name, max(b.age) FROM (
   MATCH 
       {class:Person, as:A, where:(name='John')} -FriendOf-> {as:B}
   RETURN A as a, B as b
) 

 

Pattern Matching in OrientDB Studio

Remember, to display a pattern in OrientDB Studio Graph panel, you always have to use RETURN $elements or RETURN $pathElements

See the Docs

In the official documentation you will find the formal grammar definition of MATCH statement, all the details about single components and a lot of useful examples.

http://orientdb.com/docs/last/SQL-Match.html

Pattern matching is a new feature in OrientDB v.2.2, if you want to see it improved with new functionalities, please drop a note on OrientDB issue tracker

Comments are closed.