OrientDB Database Security

London – January 16th, 2017

By OrientDB CEO, Luca Garulli

After ransomware groups recently wiped off about 34,000 MongoDB database and exposed about 35,000 Elastic Search databases on the Internet*(read the full article), we advise that OrientDB users double check their OrientDB server.

OrientDB’s average level of security is much stronger than both MongoDB and ElasticSearch. However, nothing can keep you totally safe, specially if you are exposing an OrientDB server directly to the Internet and/or you haven’t changed the default password in your database.

Follow this 5 minute action plan to keep your OrientDB database safe:

1. If you aren’t using the default users (admin, reader and writer), then delete them.

2. If you’re using them, be sure you changed the password for all 3 default users: admin, reader and writer.

3. When you installed OrientDB for the first time, the script asked for the root password. Make sure you didn’t set something obvious such as “root“, “orientdb“, “password“, or any other simple/obvious password.

Now a little advice to keep OrientDB even more secure:

1. If you can, don’t expose the OrientDB server to the Internet.security-box

2. Remember that starting from v2.2 you can configure stronger SALT cycles for hashed passwords. Take a look at the following page for more details: https://orientdb.com/docs/2.2/Database-Security.html#password-management.

3. If you’re working with very sensitive data, please consider using Encryption at REST with AES algorithm. For more details, take a look at the following page: http://orientdb.com/docs/2.2/Database-Encryption.html.

4. Don’t use a password at all. Since v2.2.14, OrientDB Enterprise Edition supports authentication via symmetric keys for the Java client. See https://orientdb.com/docs/2.2/Security-Symmetric-Key-Authentication.html.

5. Lastly, don’t forget OrientDB’s other advanced security features, such as Kerberos authentication, LDAP users, password validation, and auditing.

More Resources:

  1. Database Security
  2. Server Security

For any question, don’t hesitate to ask to the Community Group.

Thanks and keep your data safe!

Luca Garulli
Founder & CEO
OrientDB LTD

*http://www.pcworld.com/article/3157417/security/after-mongodb-ransomware-groups-hit-exposed-elasticsearch-clusters.html

This month OrientDB launched its new Client Referral Program aimed at rewarding users who recommend clients to use OrientDB’s enterprise, consulting or support options.

Here at OrientDB, we’ve always known and been proud that many of our sales come from client or community referrals. Whenever a new client says they learned about us from user reviews or from a direct contact using OrientDB Community or Enterprise editions, we take it as reassurance that our focus on software development will help drive our sales. This is not a novel strategy but one that our CEO Luca Garulli has always believed in, as our main goal is to make innovative technologies more accessible to a wider market.

“Instead of investing huge resources in marketing and sales like our competitors are doing, we prefer to focus on the product and the community first. It’s not new, other companies like Atlassian® or slack® have proven that you can build a successful product without big investments on marketing campaigns.”

Luca Garulli, CEO @ OrientDB

That’s the reason our Community edition uses an Apache 2 license. We could restrict our open source version by not
making it available for commercial use, or as some of our competitors do, we could further restrict users by forcing them to make their applications open source in order to use OrientDB Community. From any users point of view, these are all deterrents when selecting what database to use, not only from a budgetary point of view, but simply because people generally don’t like to be forced down a path they didn’t initially contemplate.

Our strategy is working. Throughout 2016 we saw sustained growth in both OrientDB Community adoption as well as increased sales for our Enterprise, Support and Consulting services. OrientDB is not only one of the most popular graph databases in the world, this year we’ve proven that the Multi-Model database is here to stay as an innovative, affordable and powerful solution for small, medium and large enterprises.

All this brings me to our new referral program. As our community of developers grows and our client portfolio referral imageexpands, the amount of referrals this year increased significantly.Obviously we also hope our initiative will help drive sales even further, but we thought it’s only be fair to extend our gratitude to all those who recommend OrientDB by giving them a portion of our sales revenue (up to 10% of a contract’s value for those who recommend 3 or more clients).

We also wanted to make this simple. To enter the program you must simply fill out a form with your referrals details and personal information. If the client you referred signs a contract with OrientDB within 90 days, one of our sales representatives will contact you so that we may process your bonus.

So if you’re using OrientDB and know of anyone looking for a powerful yet affordable NoSQL solution, take a look at our referral program.

Happy Holidays!

arunsign

 

 

 

Arun Dubey

Global Corporate Sales Director
OrientDB Ltd

Crowdsourced Competitive Intelligence Platform Honors 4,500 Winning Companies Out of 15 Million Profiles

London, UK – 17-11-2016 – OrientDB announced today it was named an Owler HOT in 2016 winner in London​. With owler awardhundreds of contributors joining the ranks, OrientDB is experiencing tremendous growth in both Community and Enterprise adoption. The native multi-model database combines the connectedness of graphs, the agility of documents and the familiar SQL dialect.

Owler recognizes the top trending companies in cities around the world. They filtered through more than 15 million companies and picked 4,500 award winners across 600 cities worldwide. Recipients were chosen based on several different metrics, including number of followers on Owler, insights collected from our community, social media followers, and blog posts over the past year.

“We’ve sorted through database of millions of contributions from our community and landed on the top trending companies from around the world,” said Jim Fowler, CEO at Owler. “Being Hot In 2016 is an accomplishment to be proud of.”

To see OrientDB’s company profile on Owler, go to: https://www.owler.com/iaApp/4498930/orientdb-company-profile .

With downloads exceeding 70,00 per month, OrientDB is an award winning, 2nd Generation Distributed Graph Database with the flexibility of Documents in one product. It is a unique, true multi-model NoSQL DBMS equipped to tackle today’s big data challenges and offers multi-master replication, sharding as well as more flexibility for modern, complex use cases. Visit https://orientdb.com/success/ to read more about companies using OrientDB.

About Owler

Owler is the crowdsourced competitive intelligence platform that business professionals use to outsmart their competition, gain competitive insights, and uncover the latest industry news and alerts. Owler is powered by an active community of 800K business professionals that contribute unique business insights such as competitors, private company revenue, and CEO ratings. From startups all the way to large enterprises (including 96% of the Fortune 500), CEOs, salespeople, marketers, product managers, and all types of business professionals use Owler daily. Launched in 2014, and funded by Norwest Venture Partners and Trinity Ventures, Owler is headquartered in San Mateo, CA with offices in Coimbatore, India.

Paolo Puccini
Marketing Manager
OrientDB Ltd

This post is outdated, please refer to the Spark page.

 

 


London, July 8, 2016
By Andrea Iacono

The Spark connector for OrientDB has been provided by Metreta and hosted on github at https://github.com/metreta/spark-orientdb-connector, letting Spark and OrientDB interoperate in two ways: accessing OrientDB data from Spark and writing Spark data to OrientDB. The connector is also aware of the difference between an OrientDB document database and an OrientDB graph database:

To compile the connector, clone the master branch and update its build.sbt file with the Scala version and the Spark version you’re using. You may subsequently launch the package command on sbt:

sbt package

 

Upon performing these steps, you should find a jar file containing the compiled connector in your target directory. Be sure to have created the test database as well (as shown in the connector’s page).

The first step for creating our sample project is to create a build.sbt, where we have to define the library dependencies:

libraryDependencies ++= Seq(
 "com.orientechnologies" % "orientdb-core" % "2.2.3",
 "com.orientechnologies" % "orientdb-client" % "2.2.3",
 "com.orientechnologies" % "orientdb-graphdb" % "2.2.3",
 "com.orientechnologies" % "orientdb-distributed" % "2.2.3",
 "org.apache.spark" % "spark-core_2.11" % "1.6.1",
 "org.apache.spark" % "spark-graphx_2.11" % "1.6.1",
 "org.scala-lang" % "scala-compiler" % "2.11.4",
 "org.scala-lang" % "scala-library" % "2.11.4",
 "org.scala-lang" % "scala-reflect" % "2.11.4",
 "jline" % "jline" % "2.12",
 "com.tinkerpop.blueprints" % "blueprints-core" % "2.6.0",
 "com.fasterxml.jackson.core" % "jackson-databind" % "2.7.4",
 "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.7.4"
)

 

We must then configure Spark to attach to OrientDB, which we can do by defining the SparkConf in the following way:

  val conf = new SparkConf()
    .setMaster("local[*]")
    .setAppName("ConnectorSample")
    .set("spark.orientdb.clustermode", "remote")
    .set("spark.orientdb.connection.nodes", "127.0.0.1")
    .set("spark.orientdb.protocol", "remote")
    .set("spark.orientdb.dbname", "test")
    .set("spark.orientdb.port", "2424")
    .set("spark.orientdb.user", "admin")
    .set("spark.orientdb.password", "admin")

 

We can now share data between Spark and OrientDB.

Orient Documents to/from Spark RDDs
Let’s start reading some OrientDB documents as a Spark RDD:

var peopleRdd: RDD[OrientDocument] = sc.orientQuery("Person")

 

With the orientQuery() method, we can read the documents of a class from OrientDB and may have them as a Spark RDD, on which we can do the usual manipulations. We can then save them back to OrientDB:

peopleRdd
 .filter(person => person.getString("name") == "John")
 .map(person => new Person("Foo", "Bar"))
 .saveToOrient("Person")

 

Like in this example where, after a bit of manipulation, we use the saveToOrient() method to save all the elements of the RDD as OrientDB documents, we can check both querying OrientDB via Studio or querying from the code:

sc.orientQuery("Person").foreach(p => println(s"Person: ${p.getString("surname")}, ${p.getString("name")}"))

 

We can also update the OrientDB documents using the upsertToOrient() method, as shown in this example where we update a document’s property via the RDD and save them back to OrientDB:

peopleRdd
 .filter(person => !person.getString("surname").startsWith("New"))
 .map(person => new Person(person.getString("name"), "New " + person.getString("surname")))
 .upsertToOrient("Person")

 

Orient Graphs to/from Spark GraphX
When we deal with graphs, RDDs are not enough and so we must move to Spark’s API for graph computing: GraphX.

To access OrientDB vertices and edges, we must use the orientGraph() method as shown in this example:

val peopleGraph: Graph[OrientDocument, OrientDocument] = sc.orientGraph()

 

Since peopleGraph is a org.apache.spark.graphx.Graph object, we can use its methods to access OrientDB data, as in these examples:

val people: VertexRDD[OrientDocument] = peopleGraph.vertices
val relationships: EdgeRDD[OrientDocument] = peopleGraph.edges

println(s"The graph contains ${people.count()} vertices and ${relationships.count()} edges.\n")

 

We can also access triplets, as in this example where we print friendships among people:

peopleGraph
 .triplets
 .foreach(triplet => {
   val srcPerson: OrientDocument = triplet.srcAttr
   val dstPerson: OrientDocument = triplet.dstAttr
   println(s"Person: ${srcPerson.getString("surname")}, ${srcPerson.getString("name")} [${triplet.srcId}]. Friend: ${dstPerson.getString("surname")}, ${dstPerson.getString("name")} [${triplet.dstId}]")
 })

 

The built-in graph algorithms supplied by GraphX are also available, like the triangleCount() used here to show the triangles among people:

val triangles = peopleGraph.triangleCount()

// prints how many triangles every vertex participate in
triangles
 .vertices
 .foreach {
   case (vertexId, trianglesNumber) => println(s"Person [${vertexId}] participates in ${trianglesNumber} triangles.")
 }

 

When we have a GraphX graph and we want to save it as an OrientDB graph, we can use the saveGraphToOrient():

val gr: Graph[Person, String] = createSampleGraph(sc)
gr.saveGraphToOrient()

 

In this example, the createSampleGraph() method just creates a simple graph with three vertices and five edges as RDDs and then builds the graph upon them:

def createSampleGraph(sparkContext: SparkContext): Graph[Person, String] = {

 val people: RDD[(VertexId, Person)] =
   sparkContext.parallelize(
     Array(
       (1L, new Person("Alice", "Anderson")),
       (2L, new Person("Bob", "Brown")),
       (3L, new Person("Carol", "Clark"))
     )
   )


 val edges: RDD[Edge[String]] =
   sparkContext.parallelize(
     Array(
       Edge(1L, 2L, "Friendship"),
       Edge(1L, 3L, "Friendship"),
       Edge(2L, 1L, "Friendship"),
       Edge(3L, 1L, "Friendship"),
       Edge(3L, 2L, "Friendship")
     )
   )
 Graph(people, edges)
}

 

This full code of these examples is available on github at https://github.com/andreaiacono/SparkOrientDbConnectorDemo.

London, June 27, 2016

ISCLogo-Horizontal  &  orientdb_logocrop

 

We are proud to announce that Innov8tive and OrientDB, have joined forces to develop the new .NET drivers for OrientDB.  This exciting venture means that OrientDB users will soon be provided with official .NET drivers and support, which was under active community development.

“Innov8tive is passionate about everything from mobile apps to data centers.  Furthermore, their strong focus on security makes them an ideal partner for OrientDB.”

– Luca Garulli, CEO, OrientDB

Innov8tive, experts in the field of high end software development and consulting services as well as certified Microsoft partners, have helped turn complex software ideas into reality and power IT departments with their experienced team of consultants.

“OrientDB has always been an integral part of our tool set. We are delighted to be able to deepen our partnership with OrientDB and enhance the capabilities of the .NET driver.”

– Gray Delacluyse, CPDO, Innov8tive

“Innov8tive is passionate about everything from mobile apps to data centers.  Furthermore, their strong focus on security makes them an ideal partner for OrientDB.” Says Luca Garulli, CEO for OrientDB.  “This is exactly the type of company OrientDB seeks for their ‘Think Globally, Act Locally’ Partnership program.”

Driver development is currently in progress and will be officially announced by OrientDB.  Users will soon be able to have improved cross-platform development, which will only enhance applications built using OrientDB.  This is especially true for OrientDB 2.2.x users who enjoy improved security features as well as an optimized core engine along with new configurable graph consistency.

If you’re interested in using OrientDB but haven’t downloaded it already, give it a try.  OrientDB community edition is Open Source and completely free.  Its Enterprise Edition may be used at no cost for development purposes and competitive support packages come with a commercial license for those hoping to use OrientDB Enterprise edition in production environments.

Paolo Puccini
Marketing Specialist
OrientDB LTD

On Wednesday June 1st, OrientDB’s CEO, Luca Garulli, hosted an Official OrientDB 2.2 Webinar, going through the exciting new features of the latest 2.2 version and answering questions from participants. We’d like to share this with everyone, so we’ve uploaded the video and slideshare slides, as well as included all attendee questions and answers below. Please take a look at the video and/or slides and remember, if you have any additional questions, please contact us here.

Lastly, stay tuned for upcoming posts about OrientDB’s MATCH function and SPATIAL support. Though we would have loved to discuss these features during the June 1st webinar, they deserve special attention and will be discussed in detail in our next posts.

UPDATE: check out our MATCH blog post, by Luigi Dell’Aquila

Video[youtube video=”MVDcldr87KU” width=”450″ height=”340″] Slides

 Webinar Q & A

Q: What about plugins to connect with mapping tools like geoserver?

Luca: OrientDB does not have a plugin for geoserver yet, but we would be happy to receive any contributions and help any contributors on this topic. If a large portion of the community demands this plugin, we’ll add it to the roadmap.

Q: Hot Alignment was not recommended in 2.1. Is this supported in 2.2?

Luca: Excellent question! The Hot Alignment configuration setting is no longer supported in v2.2, because when the node starts up in 2.2, it now always executes a Hot Alignment procedure. So if you’re migrating from v2.1, you can completely remove that setting from the configuration.

Q: Are you based in London?

Luca: The company is based in London, but the team is distributed across the globe.

Q: Is teleporter feature available in community edition of 2.2.0 ?

Luca: Teleporter is part of Enterprise Edition, but you can use the Enterprise Edition for Development for FREE. This means that if you want to import a database from Oracle or any other Relational DBMS to OrientDB, you can use it. However, if you’d like to keep OrientDB synchronized with an RDBMS, where the RDBMS is the master of data, this will require a Production license (subscription).

Q: Just curious if OrientDB can use the same type of compression which is used in lucene to make it faster?
Luca: OrientDB uses Lucene under the hood for Full-Text and Spatial indexes. For other indexes, OrientDB has its own compression algorithms.

Q: Is Encryption at rest only available in OrientDB Enterprise?

Luca: Encryption at rest is available in the Community Edition too.

Q: What is the plan to support spatial indexes via gremlin? Currently, sql is the only way to hit them.

Luca: Gremlin v3 supports custom functions, so even if it’s not in our short-term roadmap, this is definitely doable. If you’re interested in this feature, please open a new issue for that topic. Thanks!

Q: How fast loading is the OrientDB ETL? What about the command Console or different way?

Luca: It depends on the format. For a plain CSV, I suggest that you use the OrientDB ETL. For GRAPHML files, use the console. If you’re importing from an RDBMS instead, please evaluate Teleporter or OrientDB ETL.

Q: OrientDB is at the same time very powerful and very frustrating to use. The frustration comes mostly from incomplete documentation and a general lack of examples. What are you doing to address this?

Luca: We’ve worked significantly over the last year on the documentation to make it more complete and clear. Every day we work on providing more examples to make the learning curve less steep, but there is always more we can do. If there is a specific topic or example that you need that lacks documentation, please let us know so that we can update the documentation.

Q: What geo-spatial filtering options are available in OrientDB?

Q: What are the geo-spatial filtering options are available (apart from distance()) in Graph / Document mode?

Q: What about the point-in-polygon filtering?

Luca: You can find all the supported Geo-Spatial functions here: https://orientdb.com/docs/last/Spatial-Index.html. OrientDB kept the syntax from PostGIS, so if you’re already familiar with it, then using these functions with OrientDB should be very similar.

Q: What is the progress on the .NET driver? Is it stable? Can features like Live Query be used?

Luca: I’ve got good news and bad news. The bad news is that the .NET driver doesn’t support the Live Query feature yet. The good news is that we found .NET experts that are currently working on the driver to align it soon to the last version (v2.2).

Q: Is it possible to use Docker to setup a OrientDB cluster?

Luca: Sure! You can use our official image. Take a look at the OrientDB Download page.

Q: Can we enable command cache for selective queries? Also can we explicitly query the DB if command cache is enabled?

Luca: No, but these are very good ideas! Thank you. I’ve just opened an issue for that.

Q: What is the license cost for OrientDB Enterprise? 

Luca: We adopted a transparent policy for prices too. You can find all the prices on the official website under the Support and Subscriptions page.

Q: On a different topic: What is considered “a lot of” edges between two vertexes? In other words, should I be concerned about a database architecture that might have hundreds of edges on a vertex? What about thousands? More?

Luca: This is the classic “super-node” problem. OrientDB is able to manage many (thousands/millions) edges in an efficient manner. However, when you have a node with millions of edges and you have to traverse all of them via a query, this could be a very expensive task. In this case, I suggest you create an index on edge properties for a faster lookup. If you use the new SQL MATCH expression, it will be able to optimize the pattern matching query based on the available indexes.

Q: Where would I find more information about the SQL operations to perform keyword matching at the document level?

Luca: You can find more information on the documentation under the MATCH page.

Q: Live queries: what about deletions?

Luca: If the deletion impacts records that match your query, the event is sent to the subscriber with event type = deleted.

Q: Is there any plan to have an embedded version of OrientDB (for mobile app)?

Luca: Not in the short term, even if there is an unofficial porting for Android.

Q: Do you have or plan to have an official channel for questions and answers? I mean something like stack overflow.

Q: I like a lot OrientDB, but i’m wondering if I’ll find free support in some way.

Luca: You can find FREE support through 2 channels: StackOverflow by using the OrientDB tag and the Community Group (+3,000 members).

Q: How can we compare OrientDB to ElasticSearch+Graph plugin?

Luca: The Graph plugin of ElasticSearch doesn’t actually turn ES into a Graph Database. It’s just a way for Kibana to render data in the form of graph.

Q: Hi, I have read quite a lot of complaints mainly about the instability of OrientDB and the lack of responsive action towards the Community Edition users; what is your response to that?

Luca: With about 70k downloads/month, there will surely be someone who encounters issues getting OrientDB to work by themselves, or sometimes the problem is that OrientDB doesn’t fit their use case properly, like other products can. About the number of issues we currently have, I compared them with other Open Source DBMS projects and I wrote this recent blog post about it. Since that post, we have resolved even more issues and currently have less bugs than even Cassandra. While the entire team puts significant effort in helping community users, providing detailed documentation and including enterprise functionality in the community version (free for commercial purposes too), there is always another avenue to get these issues prioritized/resolved quicker: Support or Consulting. Our prices are public and fair (significantly less than the competition). Our clients are able to chat and work directly with OrientDB Developers to get their questions answered and issues resolved. Also, there are SLAs in place to ensure that all inquiries are responded to and handled as quickly as possible.

OrientDB v2.2 Beta Going Live Next Week!

 

We’ve worked hard on it,  it’s about to go live, we can’t contain our excitement, and of course, we couldn’t have done it without you!

 

[image src=”https://orientdb.com/wp-content/uploads/2015/12/5.png” align=”left” caption=”OrientDB Labs”] For the past few years all of us here at OrientDB have strived to deliver a fresh, innovative platform on which enterprises could work on to enrich their business.  We’ve always been proud of our work, and it’s been a real pleasure to collaborate with and learn from an ever growing community of dedicated developers.

With the launch of OrientDB back in 2011 we breathed some fresh air into the DBMS landscape, making waves with our multi-model approach (to the delight of some and shock of others). From then on it’s been a rollercoaster of a ride and in 2015 we continued to disrupt the market and climb the DBMS charts.  We’re hoping 2016 will be no different, and in that spirit we are launching our 2.2 Beta version of OrientDB in the next few days **queue the drum roll**

We couldn’t have achieved this without the ever growing community of developers who worked tirelessly alongside our team.  Of course, this is a Beta release so not suitable for production environments but we’re hoping our community is just as excited as we are with the announcement, and we’re looking forward to working together to test what we think will be our best release so far.[image src=”https://orientdb.com/wp-content/uploads/2016/02/portal1.png” align=”right”]

Our engineers have been working hard to make OrientDB even better and we’ve loaded version 2.2 full of new features which we’re sure you’ll love too.  The main focus of this version is on Security, APIs and performance, building on our distributed capabilities as well as graph enhancements.  We’ve added new features such as Teleporter, which introduces a simple, straightforward way to convert your relational databases to OrientDB.  Other new features include:

 

During the coming days, we’ll be discussing several of the new features here in our company blog so stay tuned for more on each of OrientDB’s new capabilities and enhancements.  Once we’re live, go ahead and take if for a test drive and let us know what you think.  For a complete list of features please refer to our OrientDB v2.2 documentation following the Beta launch.

Thank you to our community and looking forward to receiving your feedback!

 

Luca Garulli

Chief Executive Officer, Founder

February 8, 2016

By Andrey Lomakin, Lead Research & Development Engineer at OrientDB

In OrientDB v2.2 we’ve added tools which enable storage performance metrics to be gathered for a whole system and for a concrete command executed at that current moment. This feature will not only be interesting for database support teams, but it will probably also be of interest to users who want to understand why a database is fast or slow for their use case and what the reasoning is for results attained in a benchmark.

But before we consider characteristics gathered during storage profiling, let’s take a look at OrientDB’s architecture.

All high level OrientDB components, exposed to the user as clusters or indexes, are implemented inside the storage engine as “durable components” and extend the ODurableComponent class. This class is part of the framework created to make components/data structure operations atomic and isolated in terms of  ACID properties. Each durable component has to hold its data in direct memory, not in Java heap. But if in Java we operate variables to store/read application data, durable components operate pages.

A Page is a continuous snippet of memory which always has the same fixed size and is mapped to a file placed on a disk. When data is written to the page it is  automatically written to the file, but data is not written instantly, it must sometimes pass between the moment when data is written to the page and the moment data is written to the file.

We separate write operations on pages and the file system because file system operations are slow and we try to decouple data operations and file system operations. When we change the page it is not written to the disk instantly, as I have already mentioned above, but is placed in the write cache. The write cache aggregates all changed pages and stores them to the disk in a background thread in order of their file position. So, if we have changed pages with positions 3, 2, 8, 4, they will be stored in the order 2, 3, 4, 8.

Pages are sorted by their file positions because it does not matter whether you use DDR, SSD or HDD to store your data; sequential IO operations are always faster than random IO operations. Because pages are stored in a separate background thread, disk write operation speed will be decoupled from data modification operation speed.

In case of write operations we, may delay a data write and try to convert it to a sequential IO operation, but if we need to read data we need it instantly and can not delay the data read from file. So in this case we use the well known technique of caching frequently used pages in read cache.

So taking all of the above into account, you can see that OrientDB uses 2 caches:

When we read a page from a file, the following steps are performed:

When we modified the page content, it is automatically placed in the write cache.

There is one big problem with all those caches. Such system is not durable. If the application crashes, then some data which have not yet been written to the disk will be lost.

To avoid this kind of problems we use a database journal, aka WAL (write ahead log). This makes the whole process of writing of data a bit complex. When we modify the page we do not put the page in the write cache.  Instead, we write the difference of the page content into map keys which consist of a file and index of the changed page and values which contain diff of changes between original and changed page.

When an operation on a cluster or index is completed without exceptions we extract all changes from the map and log them inside the database journal and only after that do we apply those changes to the file pages and put them in the write cache. The database journal may be treated as an “append only” log so all write operations to the database journal are sequential and as result are fast. The process of writing changes to the database journal and applying them to the “real” file page is called “atomic operation commit”.

What value does the database journal give to us ?

In both cases data consistency will not be compromised.

Taking all of above into account you probably have already concluded that the main performance characteristics of OrientDB’s storage engine (and not only OrientDB)  are:

All those numbers will show us the direction our project must evolve towards. For example, if we have good numbers for a disk cache hit rate and very few pages are read for a single component operation, we have to improve disk cache speed as a whole.  However if we have a lot of page reads for a single component operation and very low numbers for page read speeds, we need to minimize the amount of pages accessed for the single operation and convert data structures to ones which uses more sequential rather than random IO operations.

Readers may ask: “well, all of this is very good but how it is related to us?”

The answer is: when you report performance issues please provide a benchmark (though we all have different hardware and sometimes cannot simply reproduce your issue) but also provide performance numbers gathered as the results of storage profiling.

Readers might also ask: “How is that done?”

It may be done in 2 ways, by using JMX and by using SQL commands.

The JMX console provides numbers gathered from the execution of all operations in storage but SQL commands provide data which are gathered for a selected set of commands.

To gather performance for a selected set of commands you can execute a script such as the one shown below:

At the end of the script you will see the following result:

As you can see you may see numbers for storage performance as a whole and numbers for the performance of each component.

Data from the atomic operation commit phase is presented as data from the component named “atomic operation”.

If you work with an embedded database you can start and stop storage profiling by calling the following methods:

OAbstractPaginatedStorage#startGatheringPerformanceStatisticForCurrentThread() to start storage profiling and OAbstractPaginatedStorage#completeGatheringPerformanceStatisticForCurrentThread() to stop profiling of storage.

You may also connect to the JMX server and read current performance numbers from MBean with name:

com.orientechnologies.orient.core.storage.impl.local.statistic:type=OStoragePerformanceStatisticMXBean,name=,id=

Hope it will be interesting for you to read overview of OrientDB architecture and performance characteristics which are important for us. Please do not forget to send results of profiling together with performance reports.

If you have any questions about this blog entry or about any of OrientDB’s features please post your question on stackoverflow and we will answer it.

Unlock the full potential of your enterprise’s data