First Exposure to OrientDB

London, October 2nd 2014

Today a new post about OrientDB has been published. Below some of the quotes:

“My problem with Neo4j is that out of the box (the free version), it doesn’t scale horizontally and none of the enterprise features are available. Worse, the enterprise version costs a lot of money… like Microsoft enterprise money…

I could use their web GUI that looks as crisp and easy to use as Neo4j’s web GUI to graphically click around creating vertices and edges but I decided to do it in SQL to see what it felt like…

Overall I’m impressed with OrientDB and its performance metrics look pretty amazing, either on par with or better than Neo4j. I am using the SQL syntax but I’ve only had a few hours of exposure so I’m betting the above query could be refactored into something more elegant by people who know more idiomatic ways of doing things.

I’m definitely going to continue looking into OrientDB as an alternative to Neo4j, especially given the control over schema enforcement, the object inheritance model that translates directly to binding to Java POJOs using the OrientDB API (OrientDB is written in Java), and obviously the price.”

Read the Full Article, by Kevin Hoffman

 

London, September 29, 2014

After only 11 days, the new Second Milestone is available. We fixed many issues reported in M1 and we supported some new features we are sure you’ll appreciate: 40 issues in total. Please help us to test OrientDB 2.0-M2 so that we can release a stable final 2.0 in 10 days.

Summary of changes

Full list of all 40 closed issues.

 

Can I use OrientDB v2.0-M2 in production?

No. This is not the final 2.0 version. This release is the second milestone (M2 stands for Second Milestone) of 2.0 on the path to the final version in the next 10 days. We suggest using OrientDB v2.0-M2 in development and test only. If you plan to go in production before October 10th, we suggest staying with OrientDB 1.7.x. Otherwise, go ahead and use OrientDB 2.0-M2.

 

Is 2.0-M2 compatible with previous versions of OrientDB?

You can open any database created with past versions of OrientDB. In order to use the new binary serialization, you are required to export and re-import the database. For more information, take a look at Migrate from 1.7.x.

[button title=”Download OrientDB v 2.0-M2″ link=”http://www.orientechnologies.com/download”]

Have fun with graphs & documents,

Luca Garulli
CEO of Orient Technologies
the Company behind OrientDB
www.orientechnologies.com

 

London, July 4th 2014

Orient Technologies released the new OrientDB-ETL component in beta version to easily pump data into OrientDB database without writing a line of code. So far the available extractors are:

 

How to import from Relational DBMS?

 

(1) Get the right JDBC driver

Most of DBMSs support JDBC driver. All you need is to gather the right DBMS’s JDBC driver and put it in the classpath or simply in the $ORIENTDB_HOME/lib directory.

(2) Write the ETL configuration

All yo need is to write a JSON file containing the ETL process. Create a new file dbms2orient.json somewhere. With the configuration below all the records from the table “Client” are imported in OrientDB from MySQL database. Start from this one.

{
  "config": {
    "verbose": true,
  },
  "extractor" : {
    "jdbc": { "driver": "com.mysql.jdbc.Driver",
              "url": "jdbc:mysql://localhost/mysqlcrm",
              "userName": "root",
              "userPassword": "",
              "query": "select * from Client" }
  },
  "transformers" : [
    { "vertex": { "class": "Client"} }
  ],
  "loader" : {
    "orientdb": {
      "dbURL": "plocal:/temp/databases/orientdbcrm",
      "dbAutoCreate": true
    }
  }
}

(3) Start the ETL process

After installed the component execute the command:

$ oetl dbms2orient.json

 

That’s all. To improve ETL by using Transformers look at the documentation.




London, April 30th 2014

In “develop” branch (1.7-SNAPSHOT) it’s available the new Distributed Architecture with sharding features and simplified configuration.

Look at the new default default-distributed-db-config.json:

{
  "autoDeploy": true,
  "hotAlignment": false,
  "offlineMsgQueueSize" : 0,
  "readQuorum": 1,
  "writeQuorum": 2,
  "failureAvailableNodesLessQuorum": false,
  "readYourWrites": true,
  "clusters": {
    "internal": {
    },
    "index": {
    },
    "*": {
      "servers" : [ "<NEW_NODE>" ]
    }
  }
}

We removed some flags (like replication:boolean, now it’s deducted by the presence of “servers” field) and settings now are global (autoDeploy, hotAlignment, offlineMsgQueueSize, readQuorum, writeQuorum, failureAvailableNodesLessQuorum, readYourWrites), but you can overwrite them per-cluster.

Furthermore the sharding is not anymore declared per cluster but the it’s made per cluster. I explain myself better. Many NoSQL do sharding against an hashed key (see MongoDB or any DHT system like DynamoDB). This wouldn’t work well in a Graph Database where traversing relationships means jumping from a node to another one with very high probability.

So the sharding strategy is in charge to the developer/dba. How? By defining multiple clusters per class. Example of splitting the class “Client” in 3 clusters:


Class Client -> Clusters [ client_0, client_1, client_2 ]

This means that OrientDB will consider any record/document/graph element in any of such clusters as “Clients” (Client class relies on such clusters).

Now you can assign each cluster to one or more servers. If more servers are enlisted the records will be copied in all the servers. Example:

{
  "autoDeploy": true,
  "hotAlignment": false,
  "readQuorum": 1,
  "writeQuorum": 2,
  "failureAvailableNodesLessQuorum": false,
  "readYourWrites": true,
  "clusters": {
    "internal": {
    },
    "index": {
    },
    "client_0": {
      "servers" : [ "europe", "usa" ]
    },
    "client_1": {
      "servers" : [ "usa" ]
    },
    "client_2": {
      "servers" : [ "asia", "europe", "usa" ]
    },
    "*": {
      "servers" : [ "<NEW_NODE>" ]
    }
  }
}

In this case I’ve split the Client class in the clusters client_0, client_1 and client_2, each one with different configuration:

On application when I want to write a new Client to the first cluster I can use this syntax (Graph API):


graph.addVertex("class:Client,cluster:client_0");

Document API


ODocument doc = new ODocument("Client");
// FILL THE RECORD
doc.save( "client_0" );

 

OrientDB will send the record to both “europe” and “usa” nodes.

All the query works by aggregating the result sets of all the involved nodes. Example:


select from cluster:client_0

Will be executed against node “europe” or “usa”. Instead this query:


select from Client

 

Will be executed against all 3 clusters that made the Client class, so against all the nodes.

Cool!

Also projections works in a Map/Reduce fashion. Example:


select max(amount), count(*), sum(amount) from Client

In this case the query is executed across the nodes and then filtered again on starting node. Right now not 100% of the cases are covered, see below.

What’s missing yet?

(1) Class/Cluster selection strategy

Till now each class has a “defaultClusterId” that is the first cluster, by default. We need a pluggable strategy to select the cluster on create record. We could provide at least 2 implementations in bundle:
fixed, like now: takes the defaultClusterId
round-robin, clear by the name
balanced, will check the records in all the clusters and will try to write to the smaller cluster. This is super cool when you add new clusters because you’ve bought new servers. The new empty servers will be filled before the others because will have new clusters empty at the beginning. So after a while all the clusters will be balanced
Why don’t have just balanced? Because it could be more expensive to know the cluster size at run-time…. We could cache sizes and update them every XX seconds. We’ll fix this in 1.7 in a few days.

This point is not strictly related to the Distributed cfg, but it will be applied also to the default cfg.

(2) Hot change of distributed configuration

This will be introduced after 1.7 via command line and in visual way in the Workbench of the Enterprise Edition (commercial licensed).

(3) More Test Cases

I’ve created the TestSharding test case, but we’d need more complicated configurations and hours to try them. In facts it’s pretty time/cpu consuming starting up 2-3 servers in cluster every time!

(4) Merging results for all the functions

Some functions like AVG() doesn’t work in this way. I didn’t write 100% test coverage on all the functions yet. We’ll fix this in the next release.

(5) Update the Documentation

We’ll start updating the documentation with the new staff and more examples about configuration.

(6) Sharding + Replication

Mixed configuration with sharding + replication allow to split database in partitions but replicated across more than one node. This already works. Unfortunately the distributed query with aggregation in projections (sum, max, avg, etc) could return wrong results because same record could be browsed on different nodes. We’ll fix this in the next release.

 

Luca Garulli
CEO at Orient Technologies LTD
the Company behind OrientDB
http://about.me/luca.garulli




London, March 31st 2014
 

I’ve silently introduced a new feature in OrientDB (well, there was an issue, but I know somebody is not registered to GitHub notification). It’s about Transactions and Locking. Now every time you expressly lock records, all the locks are kept on the current transaction until the closing by commit() or rollback(). This means that it’s finally possible avoiding concurrent updates.

The real question is: “Is it better to be Optimistic or Pessimistic with transactions?”

 

The response is up to you, or better, to your use case. In general the Optimistic approach is preferable on modern multi-core architecture, but on massive updates against few records locking could take the best performance.

Look at this micro-benchmark.

This feature along with the new server side SQL batch, give us a lot of power. Let’s see how to create an edge with Pessimistic approach (locking) and with Optimistic approach (CAS).

Pessimistic approach

begin
let $client = create vertex Client set name = ‘Luca’
let $city = select from City where name = ‘London’ lock record
let $e = create edge Lives from $client to $city
commit

 

Optimistic approach

begin
let $client = create vertex Client set name = ‘Luca’
let $city = select from City where name = ‘London’
let $e = create edge Lives from $client to $city retry 100
commit

 
Now it’s up to you choosing the best approach for your use case.
 
Luca Garulli
CEO at Orient Technologies
the Company behind OrientDB
http://about.me/luca.garulli

 

 




Hi all,

OrientDB allowed to execute arbitrary script written in Javascript or any scripting language installed in the JVM. Well, starting from now we created a minimal SQL engine to allow batch of commands. Batch of commands are very useful when you have to execute multiple things at the server side avoiding the network roundtrip for each command. Example to create a new vertex in a transaction and attach it to an existent vertex by creating a new edge between them:

 

begin
let account = create vertex Account set name = ‘Luke’
let city = select from City where name = ‘London’
let edge = create edge Lives from $account to $city retry 100
commit
return $edge

 

It’s plain OrientDB SQL, but with a few news:
Note also the usage of $account and $city on further SQL commands.

 

Java API

 

This can be used by Java API with:
database.open("admin", "admin");

String cmd = "beginn";
cmd += "let a = create vertex set script = truen";
cmd += "let b = select from v limit 1n";
cmd += "let e = create edge from $a to $b retry 100n";
cmd += "commitn";
cmd += "return $e";

OIdentifiable edge = database.command(new OCommandScript("sql", cmd)).execute();

 

Remember to put one command per line (postfix it with n).

 

HTTP REST API

 

And via HTTP REST interface (https://github.com/orientechnologies/orientdb/issues/2056). Execute a POST against /batch URL by sending this payload:

 

{ "transaction" : true,
  "operations" : [
    {
      "type" : "script",
      "language" : "sql",
      "script" : [ "let account = create vertex Account set name = 'Luke'",
                   "let city =select from City where name = 'London'",
                   "create edge Lives from $account to $city retry 100" ]
    }
  ]
}

 

Hope this new feature will simplify your development improving performance.

 

What about having more complex constructs like IF, FOR, etc? If you need more complexity, I suggest you to use Javascript as language that already support all these concepts.

 

Luca Garulli
CEO at Orient Technologies
the Company behind OrientDB
http://about.me/luca.garulli

 

 

PS: For more information look at issues -> https://github.com/orientechnologies/orientdb/issues/2176 and https://github.com/orientechnologies/orientdb/issues/2056



The release of OrientDB 1.7-rc2 is very close. We’ve fixed many bugs, supported binary compatibility against older versions (up to v1.5) and we introduced a new interesting feature: walking into the graph. When you execute a query you get back a resultset:

orientdb {GratefulDeadConcerts}> select from v where type = ‘artist’

________________________________________________________________
#   |@RID  |in_written_by|in_sung_by|name                |type  
________________________________________________________________
0   |#9:7  |[size=9]     |[size=7]  |Bo_Diddley          |artist
1   |#9:8  |[size=4]     |[size=146]|Garcia              |artist
2   |#9:9  |#9:2         |[size=2]  |Spencer_Davis       |artist
3   |#9:27 |#9:3         |null      |Hardin_Petty        |artist
4   |#9:50 |[size=4]     |[size=99] |Weir                |artist
5   |#9:93 |[size=96]    |[size=3]  |Hunter              |artist
6   |#9:131|[size=39]    |null      |Traditional         |artist
7   |#9:136|#9:135       |null      |West_Tilghman_Holly |artist
8   |#9:169|[size=2]     |null      |Jesse_Fuller        |artist
9   |#9:174|[size=28]    |null      |Barlow              |artist
10  |#9:179|#9:88        |null      |John_Phillips       |artist
11  |#9:183|null         |[size=3]  |Weir_Hart           |artist
12  |#9:208|#9:99        |null      |Johnny_Cash         |artist
13  |#9:218|#9:102       |null      |Marty_Robbins       |artist
14  |#9:222|#9:139       |null      |Don_Rollins         |artist
15  |#9:223|null         |#9:14     |Garcia_Lesh_Weir    |artist
16  |#9:233|#9:96        |null      |Kristofferson_Foster|artist
17  |#9:245|[size=5]     |null      |Chuck_Berry         |artist
18  |#9:258|null         |[size=8]  |Pigpen_Weir         |artist
19  |#9:261|#9:69        |null      |Robert_Johnson      |artist
________________________________________________________________
LIMIT EXCEEDED: resultset contains more items not displayed (limit=20)

20 item(s) found. Query executed in 0.031 sec(s).

 
After a query, the current record is always the first one in the resultset. To display it, use the new “current” command:
 

orientdb {GratefulDeadConcerts}> current

__________________________________________________
ODocument _ Class: V   id: #9:7   v.19
__________________________________________________
       in_written_by : [size=9]            
          in_sung_by : [size=7]            
                name : Bo_Diddley          
                type : artist

 
To move to the next record, use the new “next” command:
 

__________________________________________________
ODocument _ Class: V   id: #9:8   v.153
__________________________________________________
          in_sung_by : [size=146]          
       in_written_by : [size=4]            
                name : Garcia              
                type : artist              

 
To go back in the result, use “prev”. Now, to move to the incoming ‘written_by’ vertices, before 1.7-rc2 we had to issue another query by using the current RID:

select in(‘written_by’) from #9:8

But now we can simply “move” to them by using the new “move” console command:
 

orientdb {GratefulDeadConcerts}> move in(‘written_by’)

_______________________________________________________________________________________________________________________
@RID  |#   |type|in_followed_by|name                 |song_type|performances|out_followed_by|out_sung_by|out_written_by
_______________________________________________________________________________________________________________________
#9:461|1   |song|null          |CANT COME DOWN       |original |1           |null           |#9:335     |#9:8          
#9:465|2   |song|null          |CREAM PUFF WAR       |original |7           |null           |#9:8       |#9:8          
#9:494|3   |song|null          |THE ONLY TIME IS NOW |original |1           |null           |#9:335     |#9:8          
_______________________________________________________________________________________________________________________

After this command, the resultset has been updated with those records and the current record is the first one. Now we can continue walking into the graph, through incoming or outgoing edges, vertices or even links.

Cool, right?

Have fun with Graphs,
Luca Garulli
CEO at Orient Technologies
the Company behind OrientDB
http://about.me/luca.garulli




London, February 6th 2014

 
We are glad to announce the new engine to manage relationships in graph database. According to data loading benchmark it can be up to 15 times faster than current implementation!

The new architecture is based on new data structure SB-Tree and optimized for usage not only for embedded, but for remote storage too. To achieve such kind of optimization we have introduced new data type LINKBAG, it represents set of RIDs, but allows duplication of values, also it does not implement Collection interface. LINKBAG has two binary presentations, in form of modified B-Tree and in form of collection managed as embedded in document, but collection is deserialized only on demand, in case of iteration for example.

Below the comparison on load speed on importing Wikipedia page structure (without page content) which consist of 130 millions of vertexes and more than 1 billion of edges.

To prevent duplication of vertexes we used unique index by page key. Data were taken from http://downloads.dbpedia.org/3.6/en/page_links_en.nt.bz2. Load test was ran on PC with 24 Gb RAM, 7500 RPM HDD, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz.

Test 1 consists on loading data in Transactional mode, blue line is OrientDB 1.6.x, the red line is OrientDB 1.7-rc1 with the new LINKBAG structure.

On the X axis there is the amount of imported pages, on the Y axis the time which was spent to import these pages.

mvrbtree-ridbag-unique-tx

Test 1 – OrientDB 1.6.x vs OrientDB 1.7-rc1 TX Mode

As you can see after 6,300,000th imported records the current implementation suffers of dramatic slow down, so we interrupted the test after a while.
Test 2 is like the previous test, but in non Transactional mode.
mvrbtree-ridbag-unique-notx

Test 2 – OrientDB 1.6.x vs OrientDB 1.7-rc1 No-TX Mode

Test 3 is a comparison between a full import of whole Wikipedia dataset using new LINKBAG implementation. Here the blue line is the Transactional mode and the red line is Non-Transactional mode.
ridbag-tx-notx

Test 3 – OrientDB 1.7-rc1 TX vs OrientDB 1.7-rc1 No-TX

In Non-Transactional mode test was completed in only 6.5 hours and in Transactional mode took 14 hours.

 
Andrey Lomakin
Orient Technologies LTD




TinkerPop Blueprints standard doesn’t define a proper “Factory” to get graph instances. For this reason OrientDB users that wanted to use a pool of instances had to mix 2 different API: Graph and Document one. Example:
 

ODatabaseDocumentPool pool = new ODatabaseDocumentPool(“plocal:/temp/mydb”);
OrientGraph g = new OrientGraph(pool.acquire());

 
NOTE: You could also use a OGraphDatabase instance in place of ODatabaseDocumentPool, but this API has been deprecated since a long time and will be removed in v1.7.

 
Now everything is simpler, thanks to the new OrientGraphFactory class to manage graphs in easy way (Issue #1971). These the main features:
– by default acts as a factory by creating new database instances every time
– can be configured to work as a pool, by recycling database instances
– if the database doesn’t exist, it’s created automatically (but in “remote” mode)
– returns transactional and non-transactional instances
 
This is the basic way to create the factory, by using the default “admin” user (with “admin” password by default):
 

OrientGraphFactory factory = new OrientGraphFactory(“plocal:/temp/mydb”);

 
But you can also pass user and password:
 

OrientGraphFactory factory = new OrientGraphFactory(“plocal:/temp/mydb”, “jayminer”, “amigarocks”);

 
To work with a recyclable pool of instances with minimum 1, maximum 10 instances:
 

OrientGraphFactory factory = new OrientGraphFactory(“plocal:/temp/mydb”).setupPool(1, 10);

 
Once the factory is configured you can get a Graph instance to start working. OrientGraphFactory has 2 methods to retrieve a Transactional and Non-Transactional instance:
 

OrientGraph txGraph = factory.getTx();
OrientGraphNoTx noTxGraph = factory.getNoTx();

 
Or again you can configure in the factory the instances you want and use the get() method everytime:
 

factory.setTransactional(false);
OrientGraphNoTx noTxGraph = (OrientGraphNoTx) factory.get();

 
Once finished to free all the resources (in case of pool usage), call the close():
 

factory.close();

 




When you work with Web Applications, it’s very common to query elements and render them to the user to let him to apply some changes. Once the user updates some fields and press the “save” button, what happens?

Before now the developer had to track the changes in a separate structure, load the vertex/edge from the database and apply the changes to the element.

Starting from OrientDB v1.7 we added 2 new methods to the Graph API against OrientElement and OrientBaseGraph classes:

Detach

Detach methods fetch all the record content in RAM and reset the connection to the Graph instance. This allow to modify the element off-line and re-attach it once finished.

Attach

Once the detached element has been modified, to be saved back to the database you need to call the attach() method. It restore back the connection between the Graph Element and the Graph Instance.

Example

The first step is load some vertex and detach them.

 

OrientGraph g = OrientGraph(“plocal:/temp/db”);
try{
  Iterable<OrientVertex> results = g.query().has(“name”, EQUALS, ‘fast’);
  for( OrientVertex v : results )
    v.detach();
} finally {
  g.shutdown();
}

After a while the element is updated (from GUI or by application)

 

v.setProperty(“name”, “super fast!”);

On “save” button re-attach the element and save it to the database.

 

OrientGraph g = OrientGraph(“plocal:/temp/db”);
try{
  v.attach( g ); v.save();
} finally {
  g.shutdown();
}

FAQ

Does detach go recursively to detach all connected elements?

No, it works only at the current element level.

Can I add edge against detached elements?

No, you can only get/set/remove property while is detached. Any other operation that requires the database will throw an IllegalStateException.

 




Unlock the full potential of your enterprise’s data