Dublin (CA, USA), December 14, 2017.
After more than 1 year working to release a stable v3.0, we’re finally close to the 1st release candidate for v3.0. It’s just a matter of days. OrientDB v3.0 has many new cool features that will be described in the next days. Today I’d rather like to focus on the performance of the new storage engine. We compared the results against latest v2.2 GA by using the Yahoo YCSB benchmarks on an Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz, 64 GB RAM, 512 GB SSD.
Version 2.2.20 throughput op/sec:
DB size Initial load 50% reads/50% updates 95% reads/5% updates 100% reads 25M records 7,480 24,788 83,051 177,822 50M records 6,840 3,808 15,601 30,556 100M records 6,150 2,504 11,908 21,668
Version 3.0 throughput op/sec:
DB size Initial load 50% reads/50% updates 95% reads/5% updates 100% reads 25M records 26,624 55,194 118,175 113,606 50M records 25,086 9,454 20,998 29,610 100M records 17,913 5,864 14,900 22,038
Those numbers show that OrientDB v3.0 is more than 3 times faster than v2.2 on writes (inserts and updates), while read performance remains pretty much the same. Write performance was one of the biggest complaints from our users, we finally figured out how to improve write performance without sacrificing read speed.
If you want to test your application with OrientDB v3.0, even before the GA is out, please check out the “develop” branch from GitHub:
And send us your results and feedback by writing to the Community Group.
A recent influx of vendor-sponsored NoSQL database benchmarks have been circulating and creating some buzz. Many have asked us to enter the discussion, so here’s our point of view.
A few weeks ago, a vendor created a brand new benchmark, comparing their product against OrientDB and two other competitors. We weren’t involved in the initial tests and were really surprised by the results, so we decided to take on the challenge.
We worked with the vendor by providing a pull request and instructions for improving performance, but they only applied a few of our suggestions and published partial results. Of course, it’s understandable that a vendor would never want to publish an update to a benchmark where a competing product’s performance improved by orders of magnitude, therefore providing free marketing for their competitor. We won’t mention products by name here, as the goal is not to show that we are faster, but to highlight how things can dramatically change when the vendors are consulted.
Using the same benchmarks, the same dataset and the same hardware, we re-ran the tests and optimized OrientDB. First, we fixed how the database was created (lightweight edges should be used when there are no edge attributes) and used a new algorithm for traversal that was implemented in the latest version of OrientDB. We’ve also used the new official Node.js driver. Finally, we cleaned up the dirty database the vendor used for the tests which included duplicate (some duplicated ten times) edges.
Always open to challenges, we’ve also made more optimizations here and there (all available in the latest versions) and are pretty happy with the results. Here’s a table and a chart summarizing how performance numbers evolved over time.
*Note: We could not reproduce the performance numbers shown by the competitor in the initial tests. Unfortunately, only percentages were provided, so we trusted their numbers and used them to derive the other values according to the performance difference.
You may wonder why we’re still slower on the “single reads” and “neighbors2” tests. Regarding the “single reads” test, we measured the OrientDB Server and it was extremely fast. However, 70% of the processing time was spent on unmarshalling the response of the OrientDB server by the Node.js driver. We found the bottleneck and the upcoming fix is already on our roadmap. Regarding the “neighbors2” test, well this use case has little value in the real world because when you do a traversal of neighbors, you’re interested in retrieving actual information about the neighbors instead of just their ID or key.
Furthermore, the original test was on a server with 60GB of RAM. Reproducing the same test on a common server with 16GB of RAM, the results are completely different. DBMSs that require that the database fit in RAM (or just the indexes) show their limitations when the working set exceeds the available RAM. OrientDB, instead, is highly optimized, effectively using the disk subsystem and reducing the costs.
If you’d like to run the benchmark on your configuration, here is the link to the Github repository. This is the OrientDB database that you can download for the benchmark (1.2Gb). In the project, you can also find the script to create the database from scratch by using the OrientDB ETL. Use either OrientDB 2.1 RC5 or the latest Alpha version here.
The NoSQL market is fragmented and there is no standardized, meaningful benchmark to test products against each other. Some database technologies are simple like key value or memory stores, so they win many speed benchmarks. Others are more feature complete, so while they may have slower results on very simple tests, they provide developers the freedom to focus on what matters most (i.e. building their apps). Furthermore, benchmarks are usually done by or sponsored by a vendor, so while they know their own product inside and out, they have little knowledge of the competitors’ products. Finally, benchmarks many times put production quality code against early-stage, super-optimized versions that are compiled to be extremely fast (but not ready to be used in a production scenario yet).
The end result is confusion, and quite often, numbers that are not applicable to the real world. Vendor X creates a benchmark comparing their product and it always wins on all fronts. Vendor Y, who feels the need to respond, optimizes the product for the same benchmark and wins on all fronts (as we’ve done in most of the tests above). Vendor Z likewise. Customers try the technology for their application and get completely different numbers. You get the picture.
What we found is that most benchmarks, especially those done by individuals affiliated with a specific vendor, tend to be highly biased and show a distorted view of reality. On the other hand, we firmly agree that benchmarks help vendors improve their products and we had a lot of fun learning how we could make OrientDB even faster.
We encourage customers to build their own benchmarks based on their needs and to reach out to us anytime. OrientDB was built on the idea that you can have all you need in one native multi-model DBMS: availability, scalability, relationships, data model complexity, agility and ease of use without sacrificing performance or reliability. We expect OrientDB to be faster than the competition right out of the box. However, there is always room for improvement due to the limitless use cases that OrientDB can handle.
Today we are excited to launch a Performance Jumpstart Package. This is a discounted remote consulting engagement where we work with customers and partners to prove OrientDB’s performance, scalability, availability and rich feature set.
Bring us your benchmarks. In the use case above, optimizing performance exponentially only took us a couple of days. We are so convinced by OrientDB’s capabilities and the great results that we can achieve together that we will offer a money-back guarantee if we can’t improve performance in your use case.
Customer base and company credentials are another important factor before adopting a new product. With 60,000 downloads monthly, more than 100 contributors, 100’s of paying customers and 1000’s of production users, OrientDB is a mature product, backed by a profitable company with a bright future and great momentum.
Contact us to learn more about the OrientDB Performance Challenge.
Since OrientDB is not an in-memory DBMS, in order to use the large amount of available RAM on the test machine (60GB), we tried the new Command Cache with OrientDB 2.2-Alpha. To enable it, start the OrientDB Server with the following parameters “-Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3”
London, February 6th 2014
We are glad to announce the new engine to manage relationships in graph database. According to data loading benchmark it can be up to 15 times faster than current implementation!
The new architecture is based on new data structure SB-Tree and optimized for usage not only for embedded, but for remote storage too. To achieve such kind of optimization we have introduced new data type LINKBAG, it represents set of RIDs, but allows duplication of values, also it does not implement Collection interface. LINKBAG has two binary presentations, in form of modified B-Tree and in form of collection managed as embedded in document, but collection is deserialized only on demand, in case of iteration for example.
Below the comparison on load speed on importing Wikipedia page structure (without page content) which consist of 130 millions of vertexes and more than 1 billion of edges.
To prevent duplication of vertexes we used unique index by page key. Data were taken from http://downloads.dbpedia.org/3.6/en/page_links_en.nt.bz2. Load test was ran on PC with 24 Gb RAM, 7500 RPM HDD, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz.
Test 1 consists on loading data in Transactional mode, blue line is OrientDB 1.6.x, the red line is OrientDB 1.7-rc1 with the new LINKBAG structure.
On the X axis there is the amount of imported pages, on the Y axis the time which was spent to import these pages.
Orient Technologies LTD