Stay Tuned For Our OrientDB v2.2 Beta Announcement!

OrientDB v2.2 Beta Going Live Next Week!

 

We’ve worked hard on it,  it’s about to go live, we can’t contain our excitement, and of course, we couldn’t have done it without you!

 

[image src=”https://orientdb.com/wp-content/uploads/2015/12/5.png” align=”left” caption=”OrientDB Labs”] For the past few years all of us here at OrientDB have strived to deliver a fresh, innovative platform on which enterprises could work on to enrich their business.  We’ve always been proud of our work, and it’s been a real pleasure to collaborate with and learn from an ever growing community of dedicated developers.

With the launch of OrientDB back in 2011 we breathed some fresh air into the DBMS landscape, making waves with our multi-model approach (to the delight of some and shock of others). From then on it’s been a rollercoaster of a ride and in 2015 we continued to disrupt the market and climb the DBMS charts.  We’re hoping 2016 will be no different, and in that spirit we are launching our 2.2 Beta version of OrientDB in the next few days **queue the drum roll**

We couldn’t have achieved this without the ever growing community of developers who worked tirelessly alongside our team.  Of course, this is a Beta release so not suitable for production environments but we’re hoping our community is just as excited as we are with the announcement, and we’re looking forward to working together to test what we think will be our best release so far.[image src=”https://orientdb.com/wp-content/uploads/2016/02/portal1.png” align=”right”]

Our engineers have been working hard to make OrientDB even better and we’ve loaded version 2.2 full of new features which we’re sure you’ll love too.  The main focus of this version is on Security, APIs and performance, building on our distributed capabilities as well as graph enhancements.  We’ve added new features such as Teleporter, which introduces a simple, straightforward way to convert your relational databases to OrientDB.  Other new features include:

 

During the coming days, we’ll be discussing several of the new features here in our company blog so stay tuned for more on each of OrientDB’s new capabilities and enhancements.  Once we’re live, go ahead and take if for a test drive and let us know what you think.  For a complete list of features please refer to our OrientDB v2.2 documentation following the Beta launch.

Thank you to our community and looking forward to receiving your feedback!

 

Luca Garulli

Chief Executive Officer, Founder

London, April 30th 2014

In “develop” branch (1.7-SNAPSHOT) it’s available the new Distributed Architecture with sharding features and simplified configuration.

Look at the new default default-distributed-db-config.json:

{
  "autoDeploy": true,
  "hotAlignment": false,
  "offlineMsgQueueSize" : 0,
  "readQuorum": 1,
  "writeQuorum": 2,
  "failureAvailableNodesLessQuorum": false,
  "readYourWrites": true,
  "clusters": {
    "internal": {
    },
    "index": {
    },
    "*": {
      "servers" : [ "<NEW_NODE>" ]
    }
  }
}

We removed some flags (like replication:boolean, now it’s deducted by the presence of “servers” field) and settings now are global (autoDeploy, hotAlignment, offlineMsgQueueSize, readQuorum, writeQuorum, failureAvailableNodesLessQuorum, readYourWrites), but you can overwrite them per-cluster.

Furthermore the sharding is not anymore declared per cluster but the it’s made per cluster. I explain myself better. Many NoSQL do sharding against an hashed key (see MongoDB or any DHT system like DynamoDB). This wouldn’t work well in a Graph Database where traversing relationships means jumping from a node to another one with very high probability.

So the sharding strategy is in charge to the developer/dba. How? By defining multiple clusters per class. Example of splitting the class “Client” in 3 clusters:


Class Client -> Clusters [ client_0, client_1, client_2 ]

This means that OrientDB will consider any record/document/graph element in any of such clusters as “Clients” (Client class relies on such clusters).

Now you can assign each cluster to one or more servers. If more servers are enlisted the records will be copied in all the servers. Example:

{
  "autoDeploy": true,
  "hotAlignment": false,
  "readQuorum": 1,
  "writeQuorum": 2,
  "failureAvailableNodesLessQuorum": false,
  "readYourWrites": true,
  "clusters": {
    "internal": {
    },
    "index": {
    },
    "client_0": {
      "servers" : [ "europe", "usa" ]
    },
    "client_1": {
      "servers" : [ "usa" ]
    },
    "client_2": {
      "servers" : [ "asia", "europe", "usa" ]
    },
    "*": {
      "servers" : [ "<NEW_NODE>" ]
    }
  }
}

In this case I’ve split the Client class in the clusters client_0, client_1 and client_2, each one with different configuration:

On application when I want to write a new Client to the first cluster I can use this syntax (Graph API):


graph.addVertex("class:Client,cluster:client_0");

Document API


ODocument doc = new ODocument("Client");
// FILL THE RECORD
doc.save( "client_0" );

 

OrientDB will send the record to both “europe” and “usa” nodes.

All the query works by aggregating the result sets of all the involved nodes. Example:


select from cluster:client_0

Will be executed against node “europe” or “usa”. Instead this query:


select from Client

 

Will be executed against all 3 clusters that made the Client class, so against all the nodes.

Cool!

Also projections works in a Map/Reduce fashion. Example:


select max(amount), count(*), sum(amount) from Client

In this case the query is executed across the nodes and then filtered again on starting node. Right now not 100% of the cases are covered, see below.

What’s missing yet?

(1) Class/Cluster selection strategy

Till now each class has a “defaultClusterId” that is the first cluster, by default. We need a pluggable strategy to select the cluster on create record. We could provide at least 2 implementations in bundle:
fixed, like now: takes the defaultClusterId
round-robin, clear by the name
balanced, will check the records in all the clusters and will try to write to the smaller cluster. This is super cool when you add new clusters because you’ve bought new servers. The new empty servers will be filled before the others because will have new clusters empty at the beginning. So after a while all the clusters will be balanced
Why don’t have just balanced? Because it could be more expensive to know the cluster size at run-time…. We could cache sizes and update them every XX seconds. We’ll fix this in 1.7 in a few days.

This point is not strictly related to the Distributed cfg, but it will be applied also to the default cfg.

(2) Hot change of distributed configuration

This will be introduced after 1.7 via command line and in visual way in the Workbench of the Enterprise Edition (commercial licensed).

(3) More Test Cases

I’ve created the TestSharding test case, but we’d need more complicated configurations and hours to try them. In facts it’s pretty time/cpu consuming starting up 2-3 servers in cluster every time!

(4) Merging results for all the functions

Some functions like AVG() doesn’t work in this way. I didn’t write 100% test coverage on all the functions yet. We’ll fix this in the next release.

(5) Update the Documentation

We’ll start updating the documentation with the new staff and more examples about configuration.

(6) Sharding + Replication

Mixed configuration with sharding + replication allow to split database in partitions but replicated across more than one node. This already works. Unfortunately the distributed query with aggregation in projections (sum, max, avg, etc) could return wrong results because same record could be browsed on different nodes. We’ll fix this in the next release.

 

Luca Garulli
CEO at Orient Technologies LTD
the Company behind OrientDB
http://about.me/luca.garulli




OrientDB becomes Distributed using Hazelcast, Leading Open Source In-Memory Data Grid
Elastic Distributed scalability added to OrientDB, a Graph Database that support hybrid Document Database features
London, UK – Orient Technologies (http://www.orientechnologies.com/) and Hazelcast (http://www.hazelcast.com) today announced that OrientDB has gained a multi-master replication feature powered by Hazelcast.
Clustering multiple server nodes is the most significant feature of OrientDB 1.6. Databases can be replicated across heterogeneous server nodes in multi-master mode achieving the best of scalability and performance.
“I think one of the added value of OrientDB against all the NoSQL products is the usage of Hazelcast while most of the others use Yahoo ZooKeeper to manage the cluster (discovery, split brain network, etc) and something else for the transport layer.” said Luca Garulli, CEO of Orient Technologies. “With ZooKeeper configuration is a nightmare, while Hazelcast let you to add OrientDB servers with ZERO configuration. This has been a big advantage for our clients and everything is much more ‘elastic’, specially when deployed on the Cloud. We’ve used Hazelcast not only for the auto-discovery, but also for the transport layer. Thanks to this new architecture all our clients can scale up horizontally by adding new servers without stopping or reconfigure the cluster”.
“We are amazed by the speed with which OrientDB has adopted Hazelcast and we are delighted to see such excellent technologists teaming up with Hazelcast.” said Talip Ozturk, CEO of Hazelcast. “We work hard to make the best open source in-memory data grid on the market and are happy to see it being used in this way.”
Both Hazelcast and Orient Technologies are providing professional open source support to their respective projects under the Apache software license.
About Orient Technologies
Orient Technologies is the company behind the NoSQL project OrientDB, the Graph Database with a hybrid model taken from both the Document Database and Object Orientation worlds. OrientDB is FREE for any purpose even commercial because is released under the Apache2 License. Orient Technologies offers commercial services against OrientDB for companies who want supporttraining and consulting.
About Hazelcast
Hazelcast (www.hazelcast.com) develops, distributes and supports the leading open source in-memory data grid. The product, also called Hazelcast, is a free open source download under the Apache license that any developer can include in minutes to enable them to build elegantly simple mission-critical, transactional, and terascale in-memory applications. The company provides commercially licensed Enterprise editions, Hazelcast Management Console and professional open source training, development support and deployment support. The company is privately held and headquartered in Palo Alto, California.
Keywords: Hazelcast, In-memory data grid, Document Database, Object Database, In-memory Database, computing, big data, NoSQL, grid computing, Apache License, Open Source



Unlock the full potential of your enterprise’s data