Why Schema Type Still Matters for Graph Databases

The power of graph database solutions means that creating data and finding relationships between data sets is not constrained by defined data classifications, formats, storage locations or original data structure.

By storing and monitoring relationship data, graph solutions enable organizations to act on changes to their data model without the limitations of a set database structure, known as schemas. A database that isn’t limited to one schema type also simplifies data modeling and querying of connected data.

Although graph solutions support schema-less use, you might still want to use a schema to enforce some structure within your data model and use.

The data structure you choose to use depends on:

Here’s how schema types can enhance your graph database use and what to consider when aligning data schemas with graph database deployment.

Schema Options with Graph Databases

In traditional relational database use, schemas include tables, views, indexes, foreign keys and check constraints. A graph database such as OrientDB still includes a basic schema with nodes (data entities), vertex and edge objects, in which edges store information about a particular data relationship.

The degree to which you define the classes of these edges and vertices depends on your graph database needs. Graph solutions simply provide more flexibility in how you define your data model.

OrientDB’s graph solutions support three types of schema options:

With graph solutions, you can define each schema type as you create the structure of your graph database.

Selecting a Schema to Fit Your Business Insights

The type of schema you choose to power your graph solution ultimately depends on the kinds of questions you want your graph solution to infer from your data relationships. The ultimate difference between the different schema types is how specifically you define the constraints on the types of nodes and the allowed relationships between the node types.

There’s no one-size-fits-all approach, but the schema flexibility of graph solutions allows you to think about why you are querying data instead of what data when building your graph database solution.

For example, if you’re building an application for recommended services to existing customers, such as upselling financial packages for banking members, you can likely use a schema-hybrid model to define the nodes and edges with specific data types.

If you’re trying to uncover unforeseen relationships between data sets, such as in fraud-detection applications, a schema-less model enables you to adjust relationship guidelines as the database generates real-time visualizations.

Selecting a Schema that Will Scale

The schema model you choose also depends on how you’d like to scale the database for use with new or changing data inputs, systems and use cases.

Graph databases shine here because relationships and vertex types are created as new data comes into the system, allowing your database to “expand” as business use changes.

As you scale your graph solution to different areas of your business, the schema you have in place will impact how you build traversals between data. Perhaps you started with a customer targeting application using a schema-hybrid. As that grows, you might want to move to a schema-less model to extract even more data around the results of that targeting and use it to infer relationships between customer use and product innovation. Discovering or creating new relationships between data types and applications works best with flexible system design, which a schema-less model can provide. In this instance, using a dynamic language can help modify or eliminate data classes for a less rigid design.

Likewise, if you started with a schema-less strategy when building your graph database, you might find you want to enforce certain data quality standards or governance rules as more applications or inputs connect to your database. Or, perhaps, you want to bring in legacy schema indexes and represent those structures within your graph solution. In that case, it might make sense to switch to a schema-hybrid model or schema-full strategy with more defined global relationship types and rules within your database. Graphical query tools can enable developers to start building more structure into their existing database.

Selecting a Schema for Optimal Performance

Graph solutions already have a major performance advantage over schema-enforced databases. With OrientDB, users can store up to 120,000 records per second in nodes and process transactions about ten times faster than other databases with defined schemas.

Still, if you’re enforcing more defined types of rules, such as mandatory, unique or null constraints, within your schema model, it’s important to test how that structure and the applications you’re using will impact transactional output, such as:

The schema model you choose is highly dependent on the applications you want to build and how your organization wants to leverage graph databases. No matter which model you choose, graph databases will enforce data nodes and classes to maintain data integrity. However, a schema is hardly set in stone. One of the major advantages of the graph model is that it supports multiple types of schemas side-by-side and enables schema constraints to be reconfigured as needs change.

Luigi Dell’Aquila
Director of Consulting, OrientDB, an SAP Company


The release of OrientDB 1.7-rc2 is very close. We’ve fixed many bugs, supported binary compatibility against older versions (up to v1.5) and we introduced a new interesting feature: walking into the graph. When you execute a query you get back a resultset:

orientdb {GratefulDeadConcerts}> select from v where type = ‘artist’

#   |@RID  |in_written_by|in_sung_by|name                |type  
0   |#9:7  |[size=9]     |[size=7]  |Bo_Diddley          |artist
1   |#9:8  |[size=4]     |[size=146]|Garcia              |artist
2   |#9:9  |#9:2         |[size=2]  |Spencer_Davis       |artist
3   |#9:27 |#9:3         |null      |Hardin_Petty        |artist
4   |#9:50 |[size=4]     |[size=99] |Weir                |artist
5   |#9:93 |[size=96]    |[size=3]  |Hunter              |artist
6   |#9:131|[size=39]    |null      |Traditional         |artist
7   |#9:136|#9:135       |null      |West_Tilghman_Holly |artist
8   |#9:169|[size=2]     |null      |Jesse_Fuller        |artist
9   |#9:174|[size=28]    |null      |Barlow              |artist
10  |#9:179|#9:88        |null      |John_Phillips       |artist
11  |#9:183|null         |[size=3]  |Weir_Hart           |artist
12  |#9:208|#9:99        |null      |Johnny_Cash         |artist
13  |#9:218|#9:102       |null      |Marty_Robbins       |artist
14  |#9:222|#9:139       |null      |Don_Rollins         |artist
15  |#9:223|null         |#9:14     |Garcia_Lesh_Weir    |artist
16  |#9:233|#9:96        |null      |Kristofferson_Foster|artist
17  |#9:245|[size=5]     |null      |Chuck_Berry         |artist
18  |#9:258|null         |[size=8]  |Pigpen_Weir         |artist
19  |#9:261|#9:69        |null      |Robert_Johnson      |artist
LIMIT EXCEEDED: resultset contains more items not displayed (limit=20)

20 item(s) found. Query executed in 0.031 sec(s).

After a query, the current record is always the first one in the resultset. To display it, use the new “current” command:

orientdb {GratefulDeadConcerts}> current

ODocument _ Class: V   id: #9:7   v.19
       in_written_by : [size=9]            
          in_sung_by : [size=7]            
                name : Bo_Diddley          
                type : artist

To move to the next record, use the new “next” command:

ODocument _ Class: V   id: #9:8   v.153
          in_sung_by : [size=146]          
       in_written_by : [size=4]            
                name : Garcia              
                type : artist              

To go back in the result, use “prev”. Now, to move to the incoming ‘written_by’ vertices, before 1.7-rc2 we had to issue another query by using the current RID:

select in(‘written_by’) from #9:8

But now we can simply “move” to them by using the new “move” console command:

orientdb {GratefulDeadConcerts}> move in(‘written_by’)

@RID  |#   |type|in_followed_by|name                 |song_type|performances|out_followed_by|out_sung_by|out_written_by
#9:461|1   |song|null          |CANT COME DOWN       |original |1           |null           |#9:335     |#9:8          
#9:465|2   |song|null          |CREAM PUFF WAR       |original |7           |null           |#9:8       |#9:8          
#9:494|3   |song|null          |THE ONLY TIME IS NOW |original |1           |null           |#9:335     |#9:8          

After this command, the resultset has been updated with those records and the current record is the first one. Now we can continue walking into the graph, through incoming or outgoing edges, vertices or even links.

Cool, right?

Have fun with Graphs,
Luca Garulli
CEO at Orient Technologies
the Company behind OrientDB

OrientDB becomes Distributed using Hazelcast, Leading Open Source In-Memory Data Grid
Elastic Distributed scalability added to OrientDB, a Graph Database that support hybrid Document Database features
London, UK – Orient Technologies (http://www.orientechnologies.com/) and Hazelcast (http://www.hazelcast.com) today announced that OrientDB has gained a multi-master replication feature powered by Hazelcast.
Clustering multiple server nodes is the most significant feature of OrientDB 1.6. Databases can be replicated across heterogeneous server nodes in multi-master mode achieving the best of scalability and performance.
“I think one of the added value of OrientDB against all the NoSQL products is the usage of Hazelcast while most of the others use Yahoo ZooKeeper to manage the cluster (discovery, split brain network, etc) and something else for the transport layer.” said Luca Garulli, CEO of Orient Technologies. “With ZooKeeper configuration is a nightmare, while Hazelcast let you to add OrientDB servers with ZERO configuration. This has been a big advantage for our clients and everything is much more ‘elastic’, specially when deployed on the Cloud. We’ve used Hazelcast not only for the auto-discovery, but also for the transport layer. Thanks to this new architecture all our clients can scale up horizontally by adding new servers without stopping or reconfigure the cluster”.
“We are amazed by the speed with which OrientDB has adopted Hazelcast and we are delighted to see such excellent technologists teaming up with Hazelcast.” said Talip Ozturk, CEO of Hazelcast. “We work hard to make the best open source in-memory data grid on the market and are happy to see it being used in this way.”
Both Hazelcast and Orient Technologies are providing professional open source support to their respective projects under the Apache software license.
About Orient Technologies
Orient Technologies is the company behind the NoSQL project OrientDB, the Graph Database with a hybrid model taken from both the Document Database and Object Orientation worlds. OrientDB is FREE for any purpose even commercial because is released under the Apache2 License. Orient Technologies offers commercial services against OrientDB for companies who want supporttraining and consulting.
About Hazelcast
Hazelcast (www.hazelcast.com) develops, distributes and supports the leading open source in-memory data grid. The product, also called Hazelcast, is a free open source download under the Apache license that any developer can include in minutes to enable them to build elegantly simple mission-critical, transactional, and terascale in-memory applications. The company provides commercially licensed Enterprise editions, Hazelcast Management Console and professional open source training, development support and deployment support. The company is privately held and headquartered in Palo Alto, California.
Keywords: Hazelcast, In-memory data grid, Document Database, Object Database, In-memory Database, computing, big data, NoSQL, grid computing, Apache License, Open Source

OrientDB is a Graph Database “on steroids” because it supports concepts taken from both the Document Database and Object-Oriented worlds.

Take a look at this use case:  Creating a graph to map the relationships between Person and Cars.  We’re going to use the just-released OrientDB version 1.5.  

Let’s open a shell (or command prompt in Windows) and launch the OrientDB Console (use console.bat on Windows):

> ./console.sh

Now we’re going to use the console to create a brand new local database:

orientdb> create database plocal:../databases/cars admin admin plocal graph

Ok, now let’s create the first graph schema with “Person” and “Car” as 2 new Vertex types and “Owns” as an Edge type:

orientdb> create class Person extends V
orientdb> create class Car extends V
orientdb> create class Owns extends E
And let’s go populate the database with the first Graph elements:
orientdb> create vertex Person set name = ‘Luca’

Created vertex ‘Person#11:0{name:Luca} v1’ in 0,012000 sec(s).

orientdb> create vertex Car set name = ‘Ferrari Modena’

Created vertex ‘Car#12:0{name:Ferrari Modena} v1’ in 0,001000 sec(s).

orientdb> create edge Owns from (select from Person) to (select from Car)

Created edge ‘[e[#11:0->#12:0][#11:0-Owns->#12:0]]’ in 0,005000 sec(s).
Ok, now we can traverse vertices. For example, what is Luca’s car? Traverse from Luca vertex to the outgoing vertices following the “Owns” relationships:
orientdb> select name from ( select expand( out(‘Owns’) ) from Person where name = ‘Luca’ )

#   |@RID |name
0   |#-2:1|Ferrari Modena
Now we have the location of Person and we need another Vertex type called “Country” to connect to the person with a new “Lives” Edge type:

orientdb> create class Country extends V
orientdb> create class Lives extends E

orientdb> create vertex Country set name = ‘UK’

Created vertex ‘Country#14:0{name:UK} v1’ in 0,004000 sec(s).

Next, let’s associate Luca to the UK Country:

orientdb> create edge Lives from (select from Person) to (select from Country)

Created edge ‘[e[#11:0->#14:0][#11:0-Lives->#14:0]]’ in 0,006000 sec(s).

So far so good.  Our graph has been extended. 
Now, try to search the country where there are “Ferrari” cars in our database.

orientdb> select name from ( select expand( in(‘Owns’).out(‘Lives’) ) from Car where name like ‘%Ferrari%’ )

#   |@RID |name
0   |#-2:1|UK

Setting constraints on Edges

Now we’ve modeled our graph using a schema without any constraints. But it would be useful to require an Owns relationship to exist only between the Person and Car vertices. So, let’s create these constraints:

orientdb> create property Owns.out LINK Person
orientdb> create property Owns.in LINK Car

The MANDATORY setting against a property prevents OrientDB from using a lightweight edge (no physical document is created).  Be sure to pay attention and not put spaces between MANDATORY=true.

orientdb> alter property Owns.out MANDATORY=true;
orientdb> alter property Owns.in MANDATORY=true;

If we want to prohibit a Person vertex from having 2 edges against the same Car vertex, we have to define a UNIQUE index against out and in properties.

orientdb> create index UniqueOwns on Owns(out,in) unique

Created index successfully with 0 entries in 0,023000 sec(s).

Unfortunately, the index tells us 0 entries are indexed. Why?  We have already created the Owns relationships between “Luca” and “Ferrari Modena.”  In that case, OrientDB had already created a lightweight edge before we set the rule to force creation documents for Owns instances. So, you need to drop and recreate the edge.

orientdb> delete edge from #11:0 to #12:0
orientdb> create edge Owns from (select from Person) to (select from Car)

Now check that the record has been created.

orientdb> select from Owns

#   |@RID |out  |in
0   |#13:0|#11:0|#12:0


So far so good.  The constraints works.  Now try to create a “Owns” edge between Luca and UK (Country vertex):

orientdb> create edge Owns from (select from Person) to (select from Country)

Error: com.orientechnologies.orient.core.exception.OCommandExecutionException: Error on execution of command: sql.create edge Owns from (select from Person) to (sel…
Error: com.orientechnologies.orient.core.exception.OValidationException: The field ‘Owns.in’ has been declared as LINK of type ‘Car’ but the value is the document #14:0 of class ‘Country’

Now we have a typed graph with constraints. 

The next part will cover how to use polymorphism feature in the graph.

Luca Garulli, CEO

Orient Technologies, the Company behind OrientDB

Unlock the full potential of your enterprise’s data