9 July 2015
9 July 2015,
 Off

Vote on Hacker News

A recent influx of vendor-sponsored NoSQL database benchmarks have been circulating and creating some buzz. Many have asked us to enter the discussion, so here’s our point of view.

On recent benchmarks first.

A few weeks ago, a vendor created a brand new benchmark, comparing their product against OrientDB and two other competitors. We weren’t involved in the initial tests and were really surprised by the results, so we decided to take on the challenge.

We worked with the vendor by providing a pull request and instructions for improving performance, but they only applied a few of our suggestions and published partial results. Of course, it’s understandable that a vendor would never want to publish an update to a benchmark where a competing product’s performance improved by orders of magnitude, therefore providing free marketing for their competitor. We won’t mention products by name here, as the goal is not to show that we are faster, but to highlight how things can dramatically change when the vendors are consulted.

Using the same benchmarks, the same dataset and the same hardware, we re-ran the tests and optimized OrientDB. First, we fixed how the database was created (lightweight edges should be used when there are no edge attributes) and used a new algorithm for traversal that was implemented in the latest version of OrientDB. We’ve also used the new official Node.js driver. Finally, we cleaned up the dirty database the vendor used for the tests which included duplicate (some duplicated ten times) edges.

Screen Shot 2015-07-09 at 6.26.39 PM

Picture 1 — Duplicate Records

Always open to challenges, we’ve also made more optimizations here and there (all available in the latest versions) and are pretty happy with the results. Here’s a table and a chart summarizing how performance numbers evolved over time.

Screen Shot 2015-07-09 at 7.04.31 PM

Table 1 — Performance Numbers (lower is better)
Screen Shot 2015-07-09 at 6.57.32 PM
Table 2 — Relative Performance (lower is better)

*Note: We could not reproduce the performance numbers shown by the competitor in the initial tests. Unfortunately, only percentages were provided, so we trusted their numbers and used them to derive the other values according to the performance difference.

Screen Shot 2015-07-09 at 6.23.01 PM

Picture 2 — Relative Performance (lower is better)

You may wonder why we’re still slower on the “single reads” and “neighbors2” tests. Regarding the “single reads” test, we measured the OrientDB Server and it was extremely fast. However, 70% of the processing time was spent on unmarshalling the response of the OrientDB server by the Node.js driver. We found the bottleneck and the upcoming fix is already on our roadmap. Regarding the “neighbors2” test, well this use case has little value in the real world because when you do a traversal of neighbors, you’re interested in retrieving actual information about the neighbors instead of just their ID or key.

Furthermore, the original test was on a server with 60GB of RAM. Reproducing the same test on a common server with 16GB of RAM, the results are completely different. DBMSs that require that the database fit in RAM (or just the indexes) show their limitations when the working set exceeds the available RAM. OrientDB, instead, is highly optimized, effectively using the disk subsystem and reducing the costs.

If you’d like to run the benchmark on your configuration, here is the link to the Github repository. This is the OrientDB database that you can download for the benchmark (1.2Gb). In the project, you can also find the script to create the database from scratch by using the OrientDB ETL. Use either OrientDB 2.1 RC5 or the latest Alpha version here.

On benchmarks in general.

The NoSQL market is fragmented and there is no standardized, meaningful benchmark to test products against each other. Some database technologies are simple like key value or memory stores, so they win many speed benchmarks. Others are more feature complete, so while they may have slower results on very simple tests, they provide developers the freedom to focus on what matters most (i.e. building their apps). Furthermore, benchmarks are usually done by or sponsored by a vendor, so while they know their own product inside and out, they have little knowledge of the competitors’ products. Finally, benchmarks many times put production quality code against early-stage, super-optimized versions that are compiled to be extremely fast (but not ready to be used in a production scenario yet).

The end result is confusion, and quite often, numbers that are not applicable to the real world. Vendor X creates a benchmark comparing their product and it always wins on all fronts. Vendor Y, who feels the need to respond, optimizes the product for the same benchmark and wins on all fronts (as we’ve done in most of the tests above). Vendor Z likewise. Customers try the technology for their application and get completely different numbers. You get the picture.

Conclusions.

What we found is that most benchmarks, especially those done by individuals affiliated with a specific vendor, tend to be highly biased and show a distorted view of reality. On the other hand, we firmly agree that benchmarks help vendors improve their products and we had a lot of fun learning how we could make OrientDB even faster.

We encourage customers to build their own benchmarks based on their needs and to reach out to us anytime. OrientDB was built on the idea that you can have all you need in one native multi-model DBMS: availability, scalability, relationships, data model complexity, agility and ease of use without sacrificing performance or reliability. We expect OrientDB to be faster than the competition right out of the box. However, there is always room for improvement due to the limitless use cases that OrientDB can handle.

Today we are excited to launch a Performance Jumpstart Package. This is a discounted remote consulting engagement where we work with customers and partners to prove OrientDB’s performance, scalability, availability and rich feature set.

Bring us your benchmarks. In the use case above, optimizing performance exponentially only took us a couple of days. We are so convinced by OrientDB’s capabilities and the great results that we can achieve together that we will offer a money-back guarantee if we can’t improve performance in your use case.

Customer base and company credentials are another important factor before adopting a new product. With 60,000 downloads monthly, more than 100 contributors, 100’s of paying customers and 1000’s of production users, OrientDB is a mature product, backed by a profitable company with a bright future and great momentum.

Contact us to learn more about the OrientDB Performance Challenge.

Want to try all the latest and greatest enhancements in OrientDB? Download the product or contribute on Github.

-UPDATE-

Since OrientDB is not an in-memory DBMS, in order to use the large amount of available RAM on the test machine (60GB), we tried the new Command Cache with OrientDB 2.2-Alpha. To enable it, start the OrientDB Server with the following parameters “-Dcommand.cache.enabled=true -Dcommand.cache.minExecutionTime=3″

 

Comments are closed.