|OrientDB v2.2 Community Edition was released in May 2016. As most of you know, in this release OrientDB focused on security, performance, stability, APIs and operations. It was a major overhaul to our previous 2.1 release and was well received by the OrientDB community as a whole. Compared to v2.1, version v2.2 saw huge improvements on performance. With 2 nodes we measured 5 times better ,and with 3 nodes OrientDB v2.2 is 10 times faster than v2.1! Now, for those in need of enterprise level features, we’ve just released OrientDB v2.2.2 Enterprise Edition. For this release, we’ve bundled the features meant to be released in versions 2.2.0 & 2.2.1, skipped those releases and went straight to OrientDB v2.2.2 Enterprise Edition! OrientDB v2.1 EE users can now safely upgrade to our GA release of Enterprise v2.2.2 which includes the following features:
Our Enterprise edition is free for development only, and comes bundled in our OrientDB Bronze, Silver or Gold subscription packages along with a production license.
|[pardot-form id=”279″ title=”Enterprise Download”]|
Previous enterprise only features such as Auditing have now been made generally available and are included in OrientDB 2.2 Community Edition. Security is paramount and with the inclusion of new general features such as LDAP importing, password validation support, SALT encryption, System User and System Database, Auditing events were made widely available to all. OrientDB 2.2.2 Enterprise edition now includes new Enterprise only features such as:
Importing a database from Oracle or any relational DBMS has never been this easy. Teleporter is now included in OrientDB enterprise and not only allows you to migrate your data, it enables database syncing as well. Though importing databases is available in our free Enterprise Edition, syncing your database requires a production license included in our Subscription Packages.
One of our most requested features is our Incremental backup function. With OrientDB v2.2.2 Enterprise Edition, there’s no need to stop your database while performing backups! An incremental backup generates smaller backup files by storing only the delta between two versions of the database. This is useful when you execute a backup on a regular basis and you want to avoid having to back up the entire database each time.
The new SQL MATCH expression enables optimization of pattern matching queries based on available indexes. With its simplified syntax, OrientDB’s pattern matching functions introduced in Version 2.2 makes finding connections, relations, patterns and data aggregation much easier, allowing you to unleash the full potential of your data to best fit your use case.
OrientDB uses Lucene under the hood for Full-Text and Spatial indexes. For other indexes, OrientDB has its own compression algorithms. OrientDB kept the syntax from PostGIS, so if you’re already familiar with it, then using these functions with OrientDB should be very similar. To speed up spatial search and match condition, spatial operators and functions can use a spatial index if defined to avoid sequential full scan of every records. Geometry objects, supported as as embedded documents, include: Point (OPoint), Line (OLine), Polygon (OPolygon), MultiPoint (OMultiPoint), MultiLine (OMultiline), MultiPolygon (OMultiPlygon), Geometry Collections.
It is recommended that all those using previous versions of OrientDB Enterprise Edition upgrade to version 2.2.2. If you’re worried about compatibility issues, databases created with release 2.1.x are compatible with 2.2.x, so there’s no need to export/import your database. To know more what’s changed between v2.1 and v2.2, look at What’s new with V2.2
For those interested in learning more about the new features included in OrientDB v 2.2.2 Enterprise Edition, don’t forget to watch our recorded Webinar with OrientDB CEO, Luca Garulli. We’ve included the entire transcript from our Q&A section, along with references to our documentation section.
February 8, 2016
In OrientDB v2.2 we’ve added tools which enable storage performance metrics to be gathered for a whole system and for a concrete command executed at that current moment. This feature will not only be interesting for database support teams, but it will probably also be of interest to users who want to understand why a database is fast or slow for their use case and what the reasoning is for results attained in a benchmark.
But before we consider characteristics gathered during storage profiling, let’s take a look at OrientDB’s architecture.
All high level OrientDB components, exposed to the user as clusters or indexes, are implemented inside the storage engine as “durable components” and extend the ODurableComponent class. This class is part of the framework created to make components/data structure operations atomic and isolated in terms of ACID properties. Each durable component has to hold its data in direct memory, not in Java heap. But if in Java we operate variables to store/read application data, durable components operate pages.
A Page is a continuous snippet of memory which always has the same fixed size and is mapped to a file placed on a disk. When data is written to the page it is automatically written to the file, but data is not written instantly, it must sometimes pass between the moment when data is written to the page and the moment data is written to the file.
We separate write operations on pages and the file system because file system operations are slow and we try to decouple data operations and file system operations. When we change the page it is not written to the disk instantly, as I have already mentioned above, but is placed in the write cache. The write cache aggregates all changed pages and stores them to the disk in a background thread in order of their file position. So, if we have changed pages with positions 3, 2, 8, 4, they will be stored in the order 2, 3, 4, 8.
Pages are sorted by their file positions because it does not matter whether you use DDR, SSD or HDD to store your data; sequential IO operations are always faster than random IO operations. Because pages are stored in a separate background thread, disk write operation speed will be decoupled from data modification operation speed.
In case of write operations we, may delay a data write and try to convert it to a sequential IO operation, but if we need to read data we need it instantly and can not delay the data read from file. So in this case we use the well known technique of caching frequently used pages in read cache.
So taking all of the above into account, you can see that OrientDB uses 2 caches:
When we read a page from a file, the following steps are performed:
When we modified the page content, it is automatically placed in the write cache.
There is one big problem with all those caches. Such system is not durable. If the application crashes, then some data which have not yet been written to the disk will be lost.
To avoid this kind of problems we use a database journal, aka WAL (write ahead log). This makes the whole process of writing of data a bit complex. When we modify the page we do not put the page in the write cache. Instead, we write the difference of the page content into map keys which consist of a file and index of the changed page and values which contain diff of changes between original and changed page.
When an operation on a cluster or index is completed without exceptions we extract all changes from the map and log them inside the database journal and only after that do we apply those changes to the file pages and put them in the write cache. The database journal may be treated as an “append only” log so all write operations to the database journal are sequential and as result are fast. The process of writing changes to the database journal and applying them to the “real” file page is called “atomic operation commit”.
What value does the database journal give to us ?
In both cases data consistency will not be compromised.
Taking all of above into account you probably have already concluded that the main performance characteristics of OrientDB’s storage engine (and not only OrientDB) are:
All those numbers will show us the direction our project must evolve towards. For example, if we have good numbers for a disk cache hit rate and very few pages are read for a single component operation, we have to improve disk cache speed as a whole. However if we have a lot of page reads for a single component operation and very low numbers for page read speeds, we need to minimize the amount of pages accessed for the single operation and convert data structures to ones which uses more sequential rather than random IO operations.
Readers may ask: “well, all of this is very good but how it is related to us?”
The answer is: when you report performance issues please provide a benchmark (though we all have different hardware and sometimes cannot simply reproduce your issue) but also provide performance numbers gathered as the results of storage profiling.
Readers might also ask: “How is that done?”
The JMX console provides numbers gathered from the execution of all operations in storage but SQL commands provide data which are gathered for a selected set of commands.
To gather performance for a selected set of commands you can execute a script such as the one shown below:
At the end of the script you will see the following result:
As you can see you may see numbers for storage performance as a whole and numbers for the performance of each component.
Data from the atomic operation commit phase is presented as data from the component named “atomic operation”.
If you work with an embedded database you can start and stop storage profiling by calling the following methods:
OAbstractPaginatedStorage#startGatheringPerformanceStatisticForCurrentThread() to start storage profiling and OAbstractPaginatedStorage#completeGatheringPerformanceStatisticForCurrentThread() to stop profiling of storage.
You may also connect to the JMX server and read current performance numbers from MBean with name:
Hope it will be interesting for you to read overview of OrientDB architecture and performance characteristics which are important for us. Please do not forget to send results of profiling together with performance reports.
If you have any questions about this blog entry or about any of OrientDB’s features please post your question on stackoverflow and we will answer it.