Author: Luigi Dell'Aquila - @ldellaquila
In one of my previous blog posts I wrote about how SQL can be used as a foundation for a richer query language, more powerful and easier to use to query a complex data model (the document/graph multi-model).
I described the basic mechanisms and logics behind SQL (projections, filtering and so on), the deep traversal and the UNWINDing, but there is another way of querying structured data: pattern matching
OrientDB v. 2.2 includes the new MATCH statement, specifically designed to execute pattern matching queries on graphs and to mix them with the existing SQL syntax, to have the best of both worlds together.
Let's start with some examples, the domain is the well-known FoaF (friend of a friend) graph:
Alice and Bob are friends Alice -FriendOf-> Bob
You can consider a query that returns all the couples of friends:
(pseudo-code) A -FriendOf-> B
This is a pattern, a general representation of nodes and relationships that can occur in your graph.
You can write such query in SQL, with dot notation (in OrientDB) or with JOINs (in relational databases), but the pattern is not explicit:
SELECT name as name, out("FriendOf").name as friendName from Person UNWIND friendName
the query is someway procedural, it iterates over all the people and then traverses the FriendOf relationship from left to right.
In OrientDB v 2.2 you can express the same query with a MATCH statement:
MATCH {class:Person, as:A} -FriendOf-> {as:B} RETURN A.name as name, B.name as friendName
In this case the pattern is explicit! You can clearly see (even graphically) two nodes and an edge in the middle. By the way, the pattern also makes a detail explicit: the node B is not declared as a Person, so maybe the initial query was wrong and your intention was the following:
MATCH {class:Person, as:A} -FriendOf-> {class:Person, as:B} RETURN A.name as name, B.name as friendName
Now try to think about a more complex scenario, find all couple of friends that have a common friend who has a dog and where one of the two people works at Google:
Now try to express it in plain SQL, you'll see that it's not easy 😉
The pattern matching version of this query is as follows:
MATCH {class:Person, as:A} -FriendOf- {class:Person, as:B}, /* A and B are friends */ {as:A} -FriendOf- {as:C} -FriendOf- {as:B}, /* C is a friend of both A ad B */ {as:C} -HasA-> {class:Dog}, /* A has a dog */ {as:B} -WorksAt-> {class:Company, where:(name="Google")}, /* B works at Google */ RETURN A.name as name, B.name as friendName
That's easy and natural, don't you think?
In a pattern, you can declare edges with a very natural arrow notation:
{} -EdgeClass-> {} /* same as .out("EdgeClass") */ {}
With arrow notation you can only express out(), in(), and both() relationships
You can still use the old out()/in() edge notation in a pattern matching query, eg. the following queries are equivalent
MATCH {class:Person, as:A} -FriendOf-> {class:Person, as:B} RETURN A.name as name, B.name as friendName
MATCH {class:Person, as:A}.out("FriendOf"){class:Person, as:B} RETURN A.name as name, B.name as friendName
With the old edge notation, together with out/in/both, you can also use outE(), inE(), bothE(), outV(), inV(), bothV() operators, to traverse from vertices to edges and from edges to vertices. This is the only way you have to retrieve properties from edges.
MATCH {class:Person, as:A} .outE("FriendOf"){as:friendshipEdge} .inV(){class:Person, as:B} RETURN A.name as name, B.name as friendName, friendshipEdge.since as friendsSince
Pattern nodes can represent vertices (see the first examples) or edges (see "friendshipEdge" in last example).
A pattern node can have:
If you have multiple occurrences of the same node in the pattern (eg. same alias repeated in the pattern), the WHERE conditions of these nodes will be merged. Eg.
MATCH {class:Person, as:A, where:(name='John')} -FriendOf-> {as:B}, {class:Person, as:A, where:(surname='Smith')} -FriendOf-> {as:B} RETURN A.name as name, B.name as friendName
is equivalent to
MATCH {class:Person, as:A, where:(name='John' AND surname='Smith')} -FriendOf-> {as:B} RETURN A.name as name, B.name as friendName
In some cases a pattern can be represented with a single expression, but in many cases it is convenient (sometimes necessary) to represent a pattern as a set of sub-patterns.
Single sub-patterns have to be separated by commas, nodes with the same alias will be merged when the query is compiled, so the final pattern will be a single meta-graph structure.
If you define disjoint sub-patterns in the same pattern matching query, the result will be the cartesian product of the results of the sub-patterns
In the RETURN section you can declare the information you want to be returned by the query. You can use
OrientDB considers MATCH statements as normal query statements, that means that you can use them as sub-queries of normal SQL statements, eg. the following is a valid query:
SELECT a.name, max(b.age) FROM ( MATCH {class:Person, as:A, where:(name='John')} -FriendOf-> {as:B} RETURN A as a, B as b )
Remember, to display a pattern in OrientDB Studio Graph panel, you always have to use RETURN $elements or RETURN $pathElements
In the official documentation you will find the formal grammar definition of MATCH statement, all the details about single components and a lot of useful examples.
http://orientdb.com/docs/last/SQL-Match.html
Pattern matching is a new feature in OrientDB v.2.2, if you want to see it improved with new functionalities, please drop a note on OrientDB issue tracker