Preparing to the next generation of Multi-Model storage engine with OrientDB v3.0

Dublin (CA, USA), December 14, 2017.

After more than 1 year working to release a stable v3.0, we’re finally close to the 1st release candidate for v3.0. It’s just a matter of days. OrientDB v3.0 has many new cool features that will be described in the next days. Today I’d rather like to focus on the performance of the new storage engine. We compared the results against latest v2.2 GA by using the Yahoo YCSB benchmarks on an Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, 64 GB RAM, 512 GB SSD.

Version 2.2.20 throughput op/sec:

DB size       Initial load   50% reads/50% updates  95% reads/5% updates  100% reads
 25M records         7,480          24,788                83,051            177,822
 50M records         6,840           3,808                15,601             30,556
100M records         6,150           2,504                11,908             21,668

 

Version 3.0 throughput op/sec:

DB size       Initial load   50% reads/50% updates  95% reads/5% updates  100% reads
 25M records        26,624          55,194               118,175            113,606
 50M records        25,086           9,454                20,998             29,610
100M records        17,913           5,864                14,900             22,038

 

Those numbers show that OrientDB v3.0 is more than 3 times faster than v2.2 on writes (inserts and updates), while read performance remains pretty much the same. Write performance was one of the biggest complaints from our users, we finally figured out how to improve write performance without sacrificing read speed.

If you want to test your application with OrientDB v3.0, even before the GA is out, please check out the “develop” branch from GitHub:

https://github.com/orientechnologies/orientdb/tree/develop

And send us your results and feedback by writing to the Community Group.

 

Thanks.

Best Regards,
Luca Garulli
Founder OrientDB

Vote on Hacker News

A recent influx of vendor-sponsored NoSQL database benchmarks have been circulating and creating some buzz. Many have asked us to enter the discussion, so here’s our point of view.

On recent benchmarks first.

A few weeks ago, a vendor created a brand new benchmark, comparing their product against OrientDB and two other competitors. We weren’t involved in the initial tests and were really surprised by the results, so we decided to take on the challenge.

We worked with the vendor by providing a pull request and instructions for improving performance, but they only applied a few of our suggestions and published partial results. Of course, it’s understandable that a vendor would never want to publish an update to a benchmark where a competing product’s performance improved by orders of magnitude, therefore providing free marketing for their competitor. We won’t mention products by name here, as the goal is not to show that we are faster, but to highlight how things can dramatically change when the vendors are consulted.

Using the same benchmarks, the same dataset and the same hardware, we re-ran the tests and optimized OrientDB. First, we fixed how the database was created (lightweight edges should be used when there are no edge attributes) and used a new algorithm for traversal that was implemented in the latest version of OrientDB. We’ve also used the new official Node.js driver. Finally, we cleaned up the dirty database the vendor used for the tests which included duplicate (some duplicated ten times) edges.

Screen Shot 2015-07-09 at 6.26.39 PM

Picture 1 — Duplicate Records

Always open to challenges, we’ve also made more optimizations here and there (all available in the latest versions) and are pretty happy with the results. Here’s a table and a chart summarizing how performance numbers evolved over time.

Screen Shot 2015-07-09 at 7.04.31 PM

Table 1 — Performance Numbers (lower is better)
Screen Shot 2015-07-09 at 6.57.32 PM
Table 2 — Relative Performance (lower is better)

*Note: We could not reproduce the performance numbers shown by the competitor in the initial tests. Unfortunately, only percentages were provided, so we trusted their numbers and used them to derive the other values according to the performance difference.

Screen Shot 2015-07-09 at 6.23.01 PM

Picture 2 — Relative Performance (lower is better)

You may wonder why we’re still slower on the “single reads” and “neighbors2” tests. Regarding the “single reads” test, we measured the OrientDB Server and it was extremely fast. However, 70% of the processing time was spent on unmarshalling the response of the OrientDB server by the Node.js driver. We found the bottleneck and the upcoming fix is already on our roadmap. Regarding the “neighbors2” test, well this use case has little value in the real world because when you do a traversal of neighbors, you’re interested in retrieving actual information about the neighbors instead of just their ID or key.

Furthermore, the original test was on a server with 60GB of RAM. Reproducing the same test on a common server with 16GB of RAM, the results are completely different. DBMSs that require that the database fit in RAM (or just the indexes) show their limitations when the working set exceeds the available RAM. OrientDB, instead, is highly optimized, effectively using the disk subsystem and reducing the costs.

If you’d like to run the benchmark on your configuration, here is the link to the Github repository. This is the OrientDB database that you can download for the benchmark (1.2Gb). In the project, you can also find the script to create the database from scratch by using the OrientDB ETL. Use either OrientDB 2.1 RC5 or the latest Alpha version here.

On benchmarks in general.

The NoSQL market is fragmented and there is no standardized, meaningful benchmark to test products against each other. Some database technologies are simple like key value or memory stores, so they win many speed benchmarks. Others are more feature complete, so while they may have slower results on very simple tests, they provide developers the freedom to focus on what matters most (i.e. building their apps). Furthermore, benchmarks are usually done by or sponsored by a vendor, so while they know their own product inside and out, they have little knowledge of the competitors’ products. Finally, benchmarks many times put production quality code against early-stage, super-optimized versions that are compiled to be extremely fast (but not ready to be used in a production scenario yet).

The end result is confusion, and quite often, numbers that are not applicable to the real world. Vendor X creates a benchmark comparing their product and it always wins on all fronts. Vendor Y, who feels the need to respond, optimizes the product for the same benchmark and wins on all fronts (as we’ve done in most of the tests above). Vendor Z likewise. Customers try the technology for their application and get completely different numbers. You get the picture.

Conclusions.

What we found is that most benchmarks, especially those done by individuals affiliated with a specific vendor, tend to be highly biased and show a distorted view of reality. On the other hand, we firmly agree that benchmarks help vendors improve their products and we had a lot of fun learning how we could make OrientDB even faster.

We encourage customers to build their own benchmarks based on their needs and to reach out to us anytime. OrientDB was built on the idea that you can have all you need in one native multi-model DBMS: availability, scalability, relationships, data model complexity, agility and ease of use without sacrificing performance or reliability. We expect OrientDB to be faster than the competition right out of the box. However, there is always room for improvement due to the limitless use cases that OrientDB can handle.

Today we are excited to launch a Performance Jumpstart Package. This is a discounted remote consulting engagement where we work with customers and partners to prove OrientDB’s performance, scalability, availability and rich feature set.

Bring us your benchmarks. In the use case above, optimizing performance exponentially only took us a couple of days. We are so convinced by OrientDB’s capabilities and the great results that we can achieve together that we will offer a money-back guarantee if we can’t improve performance in your use case.

Customer base and company credentials are another important factor before adopting a new product. With 60,000 downloads monthly, more than 100 contributors, 100’s of paying customers and 1000’s of production users, OrientDB is a mature product, backed by a profitable company with a bright future and great momentum.

Contact us to learn more about the OrientDB Performance Challenge.

Want to try all the latest and greatest enhancements in OrientDB? Download the product or contribute on Github.

-UPDATE-

Since OrientDB is not an in-memory DBMS, in order to use the large amount of available RAM on the test machine (60GB), we tried the new Command Cache with OrientDB 2.2-Alpha. To enable it, start the OrientDB Server with the following parameters “-Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3”

 

London, February 6th 2014

 
We are glad to announce the new engine to manage relationships in graph database. According to data loading benchmark it can be up to 15 times faster than current implementation!

The new architecture is based on new data structure SB-Tree and optimized for usage not only for embedded, but for remote storage too. To achieve such kind of optimization we have introduced new data type LINKBAG, it represents set of RIDs, but allows duplication of values, also it does not implement Collection interface. LINKBAG has two binary presentations, in form of modified B-Tree and in form of collection managed as embedded in document, but collection is deserialized only on demand, in case of iteration for example.

Below the comparison on load speed on importing Wikipedia page structure (without page content) which consist of 130 millions of vertexes and more than 1 billion of edges.

To prevent duplication of vertexes we used unique index by page key. Data were taken from http://downloads.dbpedia.org/3.6/en/page_links_en.nt.bz2. Load test was ran on PC with 24 Gb RAM, 7500 RPM HDD, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz.

Test 1 consists on loading data in Transactional mode, blue line is OrientDB 1.6.x, the red line is OrientDB 1.7-rc1 with the new LINKBAG structure.

On the X axis there is the amount of imported pages, on the Y axis the time which was spent to import these pages.

mvrbtree-ridbag-unique-tx

Test 1 – OrientDB 1.6.x vs OrientDB 1.7-rc1 TX Mode

As you can see after 6,300,000th imported records the current implementation suffers of dramatic slow down, so we interrupted the test after a while.
Test 2 is like the previous test, but in non Transactional mode.
mvrbtree-ridbag-unique-notx

Test 2 – OrientDB 1.6.x vs OrientDB 1.7-rc1 No-TX Mode

Test 3 is a comparison between a full import of whole Wikipedia dataset using new LINKBAG implementation. Here the blue line is the Transactional mode and the red line is Non-Transactional mode.
ridbag-tx-notx

Test 3 – OrientDB 1.7-rc1 TX vs OrientDB 1.7-rc1 No-TX

In Non-Transactional mode test was completed in only 6.5 hours and in Transactional mode took 14 hours.

 
Andrey Lomakin
Orient Technologies LTD




London, April 30th 2013

Toyotaro Suzumura and Miyuru Dayarathna from the Department of Computer Science of the
Tokyo Institute of Technology and IBM Research published an interesting research about a benchmark between Graph Databases in the Clouds called:

XGDBench: A Benchmarking Platform for Graph Stores in Exascale Clouds”

This research conducts a performance evaluation of four famous graph data stores AllegroGraph, Fuseki, Neo4j, an OrientDB using XGDBench on Tsubame 2.0 HPC cloud environment. XGDBench is an extension of famous Yahoo! Cloud Serving Benchmark (YCSB).
OrientDB is the faster Graph Database among the 4 products tested. In particular OrientDB is about 10x faster (!) than Neo4j in all the tests.

Look at the Presentation (25 slides) and Research PDF.




Take your enterprise to the next level with OrientDB