8 February 2016
8 February 2016,
 Off

February 8, 2016

By Andrey Lomakin, Lead Research & Development Engineer at OrientDB

In OrientDB v2.2 we’ve added tools which enable storage performance metrics to be gathered for a whole system and for a concrete command executed at that current moment. This feature will not only be interesting for database support teams, but it will probably also be of interest to users who want to understand why a database is fast or slow for their use case and what the reasoning is for results attained in a benchmark.

But before we consider characteristics gathered during storage profiling, let’s take a look at OrientDB’s architecture.

All high level OrientDB components, exposed to the user as clusters or indexes, are implemented inside the storage engine as “durable components” and extend the ODurableComponent class. This class is part of the framework created to make components/data structure operations atomic and isolated in terms of  ACID properties. Each durable component has to hold its data in direct memory, not in Java heap. But if in Java we operate variables to store/read application data, durable components operate pages.

A Page is a continuous snippet of memory which always has the same fixed size and is mapped to a file placed on a disk. When data is written to the page it is  automatically written to the file, but data is not written instantly, it must sometimes pass between the moment when data is written to the page and the moment data is written to the file.

We separate write operations on pages and the file system because file system operations are slow and we try to decouple data operations and file system operations. When we change the page it is not written to the disk instantly, as I have already mentioned above, but is placed in the write cache. The write cache aggregates all changed pages and stores them to the disk in a background thread in order of their file position. So, if we have changed pages with positions 3, 2, 8, 4, they will be stored in the order 2, 3, 4, 8.

Pages are sorted by their file positions because it does not matter whether you use DDR, SSD or HDD to store your data; sequential IO operations are always faster than random IO operations. Because pages are stored in a separate background thread, disk write operation speed will be decoupled from data modification operation speed.

In case of write operations we, may delay a data write and try to convert it to a sequential IO operation, but if we need to read data we need it instantly and can not delay the data read from file. So in this case we use the well known technique of caching frequently used pages in read cache.

So taking all of the above into account, you can see that OrientDB uses 2 caches:

  • Read cache – to cache frequently used pages.
  • Write cache - to decouple data modification operations from file system write operations and to make those operations sequential.

When we read a page from a file, the following steps are performed:

  • Read page from read cache , if page is found we quit there.
  • Read page from write cache, if page is found in write queue we quit there.
  • Read page from file.
  • Add page to the read cache and return found page.

When we modified the page content, it is automatically placed in the write cache.

There is one big problem with all those caches. Such system is not durable. If the application crashes, then some data which have not yet been written to the disk will be lost.

To avoid this kind of problems we use a database journal, aka WAL (write ahead log). This makes the whole process of writing of data a bit complex. When we modify the page we do not put the page in the write cache.  Instead, we write the difference of the page content into map keys which consist of a file and index of the changed page and values which contain diff of changes between original and changed page.

When an operation on a cluster or index is completed without exceptions we extract all changes from the map and log them inside the database journal and only after that do we apply those changes to the file pages and put them in the write cache. The database journal may be treated as an “append only” log so all write operations to the database journal are sequential and as result are fast. The process of writing changes to the database journal and applying them to the “real” file page is called “atomic operation commit”.

What value does the database journal give to us ?

  • If the process crashes in the middle of the atomic operation (before atomic operation commit) nothing bad will happen because “real” data is not changed at this moment.
  • If the process is crashes during the atomic operation commit, we will restore this operation from the database journal on next the database start. If the database journal contains only half of the operation, this operation will simply not be restored.

In both cases data consistency will not be compromised.

Taking all of above into account you probably have already concluded that the main performance characteristics of OrientDB’s storage engine (and not only OrientDB)  are:

  • Speed of page reads from the cache.
  • Speed of page writes to the cache.
  • Speed of page reads from the file.
  • Speed of page writes to the file.
  • Amount of page reads for a single component operation.
  • How much time is spent on an atomic operation commit.
  • Percent of cache hits – how many times we do not need to read data from the file system but load it from the cache memory.

All those numbers will show us the direction our project must evolve towards. For example, if we have good numbers for a disk cache hit rate and very few pages are read for a single component operation, we have to improve disk cache speed as a whole.  However if we have a lot of page reads for a single component operation and very low numbers for page read speeds, we need to minimize the amount of pages accessed for the single operation and convert data structures to ones which uses more sequential rather than random IO operations.

Readers may ask: “well, all of this is very good but how it is related to us?”

The answer is: when you report performance issues please provide a benchmark (though we all have different hardware and sometimes cannot simply reproduce your issue) but also provide performance numbers gathered as the results of storage profiling.

Readers might also ask: “How is that done?”

It may be done in 2 ways, by using JMX and by using SQL commands.

The JMX console provides numbers gathered from the execution of all operations in storage but SQL commands provide data which are gathered for a selected set of commands.

To gather performance for a selected set of commands you can execute a script such as the one shown below:

At the end of the script you will see the following result:

As you can see you may see numbers for storage performance as a whole and numbers for the performance of each component.

Data from the atomic operation commit phase is presented as data from the component named “atomic operation”.

If you work with an embedded database you can start and stop storage profiling by calling the following methods:

OAbstractPaginatedStorage#startGatheringPerformanceStatisticForCurrentThread() to start storage profiling and OAbstractPaginatedStorage#completeGatheringPerformanceStatisticForCurrentThread() to stop profiling of storage.

You may also connect to the JMX server and read current performance numbers from MBean with name:

com.orientechnologies.orient.core.storage.impl.local.statistic:type=OStoragePerformanceStatisticMXBean,name=,id=

Hope it will be interesting for you to read overview of OrientDB architecture and performance characteristics which are important for us. Please do not forget to send results of profiling together with performance reports.

If you have any questions about this blog entry or about any of OrientDB’s features please post your question on stackoverflow and we will answer it.

Comments are closed.