Why Schema Type Still Matters for Graph Databases

The power of graph database solutions means that creating data and finding relationships between data sets is not constrained by defined data classifications, formats, storage locations or original data structure.

By storing and monitoring relationship data, graph solutions enable organizations to act on changes to their data model without the limitations of a set database structure, known as schemas. A database that isn’t limited to one schema type also simplifies data modeling and querying of connected data.

Although graph solutions support schema-less use, you might still want to use a schema to enforce some structure within your data model and use.

The data structure you choose to use depends on:

Here’s how schema types can enhance your graph database use and what to consider when aligning data schemas with graph database deployment.

Schema Options with Graph Databases

In traditional relational database use, schemas include tables, views, indexes, foreign keys and check constraints. A graph database such as OrientDB still includes a basic schema with nodes (data entities), vertex and edge objects, in which edges store information about a particular data relationship.

The degree to which you define the classes of these edges and vertices depends on your graph database needs. Graph solutions simply provide more flexibility in how you define your data model.

OrientDB’s graph solutions support three types of schema options:

With graph solutions, you can define each schema type as you create the structure of your graph database.

Selecting a Schema to Fit Your Business Insights

The type of schema you choose to power your graph solution ultimately depends on the kinds of questions you want your graph solution to infer from your data relationships. The ultimate difference between the different schema types is how specifically you define the constraints on the types of nodes and the allowed relationships between the node types.

There’s no one-size-fits-all approach, but the schema flexibility of graph solutions allows you to think about why you are querying data instead of what data when building your graph database solution.

For example, if you’re building an application for recommended services to existing customers, such as upselling financial packages for banking members, you can likely use a schema-hybrid model to define the nodes and edges with specific data types.

If you’re trying to uncover unforeseen relationships between data sets, such as in fraud-detection applications, a schema-less model enables you to adjust relationship guidelines as the database generates real-time visualizations.

Selecting a Schema that Will Scale

The schema model you choose also depends on how you’d like to scale the database for use with new or changing data inputs, systems and use cases.

Graph databases shine here because relationships and vertex types are created as new data comes into the system, allowing your database to “expand” as business use changes.

As you scale your graph solution to different areas of your business, the schema you have in place will impact how you build traversals between data. Perhaps you started with a customer targeting application using a schema-hybrid. As that grows, you might want to move to a schema-less model to extract even more data around the results of that targeting and use it to infer relationships between customer use and product innovation. Discovering or creating new relationships between data types and applications works best with flexible system design, which a schema-less model can provide. In this instance, using a dynamic language can help modify or eliminate data classes for a less rigid design.

Likewise, if you started with a schema-less strategy when building your graph database, you might find you want to enforce certain data quality standards or governance rules as more applications or inputs connect to your database. Or, perhaps, you want to bring in legacy schema indexes and represent those structures within your graph solution. In that case, it might make sense to switch to a schema-hybrid model or schema-full strategy with more defined global relationship types and rules within your database. Graphical query tools can enable developers to start building more structure into their existing database.

Selecting a Schema for Optimal Performance

Graph solutions already have a major performance advantage over schema-enforced databases. With OrientDB, users can store up to 120,000 records per second in nodes and process transactions about ten times faster than other databases with defined schemas.

Still, if you’re enforcing more defined types of rules, such as mandatory, unique or null constraints, within your schema model, it’s important to test how that structure and the applications you’re using will impact transactional output, such as:

The schema model you choose is highly dependent on the applications you want to build and how your organization wants to leverage graph databases. No matter which model you choose, graph databases will enforce data nodes and classes to maintain data integrity. However, a schema is hardly set in stone. One of the major advantages of the graph model is that it supports multiple types of schemas side-by-side and enables schema constraints to be reconfigured as needs change.

Luigi Dell’Aquila
Director of Consulting, OrientDB, an SAP Company