OrientDB becomes Distributed using Hazelcast, Leading Open Source In-Memory Data Grid
Elastic Distributed scalability added to OrientDB, a Graph Database that support hybrid Document Database features
Clustering multiple server nodes is the most significant feature of OrientDB 1.6. Databases can be replicated across heterogeneous server nodes in multi-master mode achieving the best of scalability and performance.
“I think one of the added value of OrientDB against all the NoSQL products is the usage of Hazelcast while most of the others use Yahoo ZooKeeper to manage the cluster (discovery, split brain network, etc) and something else for the transport layer.” said Luca Garulli, CEO of Orient Technologies. “With ZooKeeper configuration is a nightmare, while Hazelcast let you to add OrientDB servers with ZERO configuration. This has been a big advantage for our clients and everything is much more ‘elastic’, specially when deployed on the Cloud. We’ve used Hazelcast not only for the auto-discovery, but also for the transport layer. Thanks to this new architecture all our clients can scale up horizontally by adding new servers without stopping or reconfigure the cluster”.
“We are amazed by the speed with which OrientDB has adopted Hazelcast and we are delighted to see such excellent technologists teaming up with Hazelcast.” said Talip Ozturk, CEO of Hazelcast. “We work hard to make the best open source in-memory data grid on the market and are happy to see it being used in this way.”
Both Hazelcast and Orient Technologies are providing professional open source support to their respective projects under the Apache software license.
About Orient Technologies
Orient Technologies is the company behind the NoSQL project OrientDB, the Graph Database with a hybrid model taken from both the Document Database and Object Orientation worlds. OrientDB is FREE for any purpose even commercial because is released under the Apache2 License. Orient Technologies offers commercial services against OrientDB for companies who want support, training and consulting.
Hazelcast (www.hazelcast.com) develops, distributes and supports the leading open source in-memory data grid. The product, also called Hazelcast, is a free open source download under the Apache license that any developer can include in minutes to enable them to build elegantly simple mission-critical, transactional, and terascale in-memory applications. The company provides commercially licensed Enterprise editions, Hazelcast Management Console and professional open source training, development support and deployment support. The company is privately held and headquartered in Palo Alto, California.
Keywords: Hazelcast, In-memory data grid, Document Database, Object Database, In-memory Database, computing, big data, NoSQL, grid computing, Apache License, Open Source
we’re glad to announce in the “develop” branch (1.6.0-SNAPSHOT) the new distirbuted configuration engine that doesn’t use Hazelcast’s Executor Service but rather Queues. This made the entire work easier (we dropped 50% of code of previous implementation) achieving better performances.
This is the new configuration in orientdb-dserver-config.xml:
As you can see we don’t use realignment tasks anymore: when a node comes up it get the pending messages from Hazelcast queue. This is good when a server goes up & down in a reasonable time like for a temporary network failure or an upgrade of a server. In case you plan to stop and restart the server after days you’d need to re-deploy the entire database on the server.
This is the default configuration in the new default-distributed-db-config.json file, put it under $ORIENTDB_HOME/config and remove that contained under the database if any. This configuration comes with only one partition where new joining nodes auto register themselves. So this means all the nodes have the entire database (no partitioning). To achieve better performance avoid to use a writeQuorum bigger than 2. I think having 2 servers that synchronously have the data it’s very secure. However all the server in the partition are updated, just the client receives the response when writeQuorum is reached. This is the new default configuration file:
[ "" ]
Partitions contains an array of partitions as node names. The keyword “<NEW_NODE>” means that new node that joins the cluster are automatically added in that partition.
So if you start X nodes the replication works out of the box. Thanks to the partitioning you can shard your database across multiple nodes avoiding to have a symmetric replica. So you can use your 6 servers as before:
[ “node0”, “node1”, “node2”, “node3”, “node4”, “node5” ]
or in this way:
[ “node0”, “node1”, “node2” ],
[ “node3”, “node4”, “node5” ]
It’s like RAID. With this configuration you’ve 2 partitions (0 and 1) with 3 replica each. So the database is spanned across 2 partitions automatically that means each partition owns about half database. By default we provide the “round-robin” strategy but you can already plug your one to better split the graph following your domain.
This is part of the Sharding feature, so consider it as a preview. It will be available in the next release (1.7 or 2.0).
Furthermore we’ve introduced variable timeouts that change based on the runtime configuration (number of nodes, type of replicated command, etc.)
We’ve also introduced new settings to tune the replication engine (OGlobalConfiguration class):
DISTRIBUTED_THREAD_QUEUE_SIZE(“distributed.threadQueueSize”, “Size of the queue for internal thread dispatching”, Integer.class,
DISTRIBUTED_CRUD_TASK_TIMEOUT(“distributed.crudTaskTimeout”, “Maximum timeout in milliseconds to wait for CRUD remote tasks”,
“Maximum timeout in milliseconds to wait for Command remote tasks”, Integer.class, 5000),
DISTRIBUTED_QUEUE_TIMEOUT(“distributed.queueTimeout”, “Maximum timeout in milliseconds to wait for the response in replication”,
“Maximum timeout in milliseconds to collect all the synchronous responses from replication”, Integer.class, 5000),
“Maximum timeout in milliseconds to collect all the asynchronous responses from replication”, Integer.class, 15000),
“Maximum timeout in milliseconds to collect all the asynchronous responses from replication”, Integer.class, 15000);
I suggest to everybody is using the distributed architecture to upgrade to 1.6.0-SNAPSHOT and:
– update the file orientdb-dserver-config.xml and !
– update the file default-distributed-db-config.json
– remove the previous file default-distributed-db-config.json under the databases/ directory
We successfully tested the new engine against 10 servers and 200 connected clients with a writeQuorum=2 (2 synchronous copy before to give the OK to the client).
Next week we’ll release OrientDB 1.6, so you’ve last chance to vote your bugs to be in 1.6 roadmap!
CEO at Orient Technologies
the Company behind OrientDB
London, September 21th 2013
NuvolaBase, the Orient Technologies division that offered OrientDB as “Database As A Service” (DaaS), announces to stop its activity. In this way Orient Technologies company will be more focused against OrientDB on development, support, consultancy and training.
All the paying clients will be hold for FREE for +6 months after the expiration of their contracts to give them the time to switch to another solution.
What about the alternatives?
We suggest to install OrientDB Open Source Graph Database on top of a Virtual Machine hosted at any Cloud Provider like AWS.