Master Data Management – Five Things to Know

In the world of master data management, silos are a tremendous challenge.

When enterprises try to process information from disparate systems, they too often use sub-optimal applications and initiatives laden with errors and misinformation, not to mention blown timelines and budgets. But master data management (MDM) is actually more than just the breaking down of data silos. It’s about efficiency and service, innovation and security, clarity and perspective. It’s about getting the most of your most valuable resource: your data.

Here are the five things you need to know about MDM:

The Challenge of Multiple Data Systems

For existing enterprises, one of the largest hurdles to developing an MDM system is the multiplicity of databases and applications usually involved. What’s more, Enterprise Resource Planning and Customer Relationship Management systems rely on structured data, whereas the proliferation of IoT devices has created exponential growth in unstructured data.

Take the example of Enel, which is one of the largest power utilities in Europe. Enel was struggling to provide analytics and reporting across all of its power generation plants and equipment. They see data flowing in from multiple systems, including IoT devices on power generation equipment, plant maintenance systems, scheduling applications, and other sources. Each of these data sources has its own data types. Enel was exporting data to csv files and manually aligning the data to generate reports and analytics.

Other companies in similar scenarios might invest in expensive integration bus systems to support a polyglot persistent environment.

Enel found a solution in a native multi-model database. This allowed them to bring all data into a single database. This means no more worrying about different data types or keeping the different systems in sync. The result was real-time data analysis across all sites and multiple data systems. No more month-long manual processes to manually generate reports.

Master Data Management Really is for Everyone

All companies are now digital enterprises. Since all systems rely on data, MDM is a discipline in which all organizations need to remain competitive. Master data powers everything from financial reporting to real estate transactions to fraud protection. The ultimate results are faster and better decisions, improved customer satisfaction, enhanced operational efficiency, and a better bottom line.

Redundancy Elimination is Only Part of It

Most people who’ve heard of MDM immediately link it to one of its primary objectives: the elimination of redundant data. Yes—having a central repository of data will eliminate data redundancies, as long as it’s done correctly. But the benefits of MDM extend beyond redundancy elimination. Namely: data consistency, data flexibility, data usability, and data security (from role-based access).

Mergers and Acquisitions Don’t Have to Mean a Master Data Management Nightmare

Mergers and acquisitions can be rough on data consistency. Reconciling several master data systems brings headaches from different data system taxonomy and structures. This usually results in two systems remaining separate and linked only through a special reconciliation process.

As more acquisitions and mergers occur, the problem compiles into a labyrinth of siloed systems and data. This brings you back the problem that spurned you to invest in MDM in the first place.

The answer lies in the database management system and vendor you choose for your master data MDM system. Make sure to choose a vendor that offers a flexible, multi-model database that allows you to easily develop a single data taxonomy.

The Database that Backs Your Master Data Management System is Key

The most powerful and effective MDM systems run on databases that fit the business model in question.

As an example, Xima Software uses networks that are like graphs. As such, for a telco, an MDM system via a multi-model graph database is the most effective MDM strategy because the database allows for easy visualization of the network since it uses the same graph model.

Master Data Management is Evolving

If there were a fifth thing you needed to know about MDM, it’s that it’s rapidly evolving to meet the needs of today’s enterprises and their customers. Retailers are using it to improve time-to-market and address their customers’ growing expectations to deliver a true omnichannel experience. The consumer packaged goods industry is using it to ensure the accuracy of nutritional information and comply with local disclosure regulations. And every industry is using it to break down data silos.

Gerard (Jerry) Grassi, P.E.
Senior Vice President – OrientDB
SAP

OrientDB Community Awards

OrientDB Community Awards

 

We value and appreciate the hard work put in by the world-wide OrientDB community. That’s why, as a small token of appreciation, we’ve started sending out some gadgets and rewards to our community members.  

 


Code Contributors

Stabrizi
Saeed Tabrizi

A special thank you to Saeed for his dedication to OrientDB. Among the numerous and valuable contributions, some noteworthy examples include Pull Requests on OrientJS repository in which, among several improvements, he implemented the IF NOT EXIST clause when creating classes and properties, and IF EXIST clause when dropping classes and properties.

mpollmeier
Michael Pollmeier

Michael is the original Author of the Apache TinkerPop 3 Graph Structure Implementation for OrientDB, which will be officially supported in upcoming major OrientDB releases!

 


Community Contributor

smolinari
Scott Molinari

Not only has Scott provided detailed bug reports and documents, he’s helped countless community members by shedding light on new features and helped countless others experiencing issues.

 


Thank You for Your Contributions

Thank you to Saeed, Michael and Scott, who as a gesture of appreciation will be receiving a Raspberry Pi 3® Starter Kit along with some OrientDB merchandise (T-shirt, stickers and that kind of stuff)**.

r-pi2orientdbshirts1

Next time – Bloggers and Writers

We’d also like to send out a special thank you to all the community members writing about OrientDB in their blogs, articles & papers. Thats why next time around we’ll be sending out some more gadgets to our top community bloggers.

So if you’re currently writing about @OrientDB, remember to use the the #OrientDB and #Multimodel tags in your posts and head back to this page regularly. You might find your name on our Top Contributors list!

*All trademarks are the property of their respective owners.
**All OrientDB Community Award winners will be contacted individually in order to receive their prize.

 

London, April 4, 2016

The OrientDB Team has just released OrientDB v2.1.15, resolving 8 issues from v2.1.14. This is the last stable release. Please upgrade your production environments to v2.1.15. For more information, take a look at the Change Log.

Download OrientDB v2.1.15 now: https://orientdb.com/download

A big thank you goes out to the OrientDB team and all the contributors who worked hard on this release, providing pull requests, tests, issues and comments.

Best regards,

Luigi Dell’Aquila
Director of Consulting
OrientDB LTD

OrientDB launches its Open Source NoSQL Graph-Document Database through CenturyLink’s Cloud Marketplace

 

LONDON, UK – March 17, 2016 OrientDB, the pioneer behind the world’s first Open Source, NoSQL distributed graph-document database, today announced its certification under the CenturyLink Cloud Marketplace Provider Program. Through this partnership, CenturyLink Cloud users are now able to deploy and manage OrientDB’s community or enterprise edition databases via CenturyLink’s Blueprints library.

OrientDB is a second-generation distributed graph database with the flexibility of documents and an open source Apache 2 license. By treating every vertex and edge as a JSON (JavaScript Object Notation) document, OrientDB enables the creation of multi-directional property graphs, allowing bulks of data to be traversed with ease. This new multi-model approach, with a polyglot engine, eliminates the need for multiple systems, ensures data consistency and optimizes the formation of complex relationships. Even for a document-based database, the relationships are managed, as in graph databases, with direct connections amongst records. Its versatility and rapid integration makes OrientDB a perfect candidate for use cases ranging from recommendation engines and fraud detection to real-time analytics and content management. Fortune 500 companies, government entities and startups all use the technology to build large-scale innovative applications.

CenturyLink Cloud customers can now benefit from OrientDB’s features:

OrientDB Community Edition is free for any purpose (including commercial use). OrientDB Enterprise Edition serves as an extension of the Community Edition by providing enterprise-needed features such as: Query Profiler, distributed clustering configuration, metrics recording, and Live Monitor with configurable alerts.

The CenturyLink Cloud Marketplace Provider Program allows participating technology companies, like OrientDB, to integrate with the CenturyLink Cloud platform. These additional business-ready solutions are available to CenturyLink’s cloud, hosting and network customers.

“Companies hoping to leverage big data are getting tired of dealing with multiple systems and increasing infrastructural costs,” said Luca Garulli, CEO of OrientDB. “Customers choose OrientDB for its innovative Multi-model database capabilities and affordable nature. Expanding our capabilities to the cloud through CenturyLink provides the perfect accessible solution without the need for multiple database systems or costly servers.”

“The foundation of the big data revolution on our platform has been software innovation around unstructured data management,” said David Shacochis, vice president of platform enablement at CenturyLink, “OrientDB is a great example of this trend, allowing our customers to manage their unstructured data relations in a scalable model that drives insight out of their business workloads”

To start using OrientDB on CenturyLink Cloud today, refer to the “Getting Started” guide on the CenturyLink Cloud Knowledge Base.

About OrientDB

OrientDB is an open source 2nd Generation Distributed Graph Database with the flexibility of Documents and a familiar SQL dialect. With downloads exceeding 70,000 per month, more than 100 community contributors and 1000’s of production users, OrientDB is experiencing tremendous growth in both community and Enterprise adoption. First generation Graph Databases lack the features that Big Data demands: multi-master replication, sharding and more flexibility for modern complex use cases. See for yourself, Download OrientDB and give it a try.

Editorial Contacts:
Paolo Puccini
OrientDB Ltd
+44 203 3971 609
info@orientdb.com

February 8, 2016

By Andrey Lomakin, Lead Research & Development Engineer at OrientDB

In OrientDB v2.2 we’ve added tools which enable storage performance metrics to be gathered for a whole system and for a concrete command executed at that current moment. This feature will not only be interesting for database support teams, but it will probably also be of interest to users who want to understand why a database is fast or slow for their use case and what the reasoning is for results attained in a benchmark.

But before we consider characteristics gathered during storage profiling, let’s take a look at OrientDB’s architecture.

All high level OrientDB components, exposed to the user as clusters or indexes, are implemented inside the storage engine as “durable components” and extend the ODurableComponent class. This class is part of the framework created to make components/data structure operations atomic and isolated in terms of  ACID properties. Each durable component has to hold its data in direct memory, not in Java heap. But if in Java we operate variables to store/read application data, durable components operate pages.

A Page is a continuous snippet of memory which always has the same fixed size and is mapped to a file placed on a disk. When data is written to the page it is  automatically written to the file, but data is not written instantly, it must sometimes pass between the moment when data is written to the page and the moment data is written to the file.

We separate write operations on pages and the file system because file system operations are slow and we try to decouple data operations and file system operations. When we change the page it is not written to the disk instantly, as I have already mentioned above, but is placed in the write cache. The write cache aggregates all changed pages and stores them to the disk in a background thread in order of their file position. So, if we have changed pages with positions 3, 2, 8, 4, they will be stored in the order 2, 3, 4, 8.

Pages are sorted by their file positions because it does not matter whether you use DDR, SSD or HDD to store your data; sequential IO operations are always faster than random IO operations. Because pages are stored in a separate background thread, disk write operation speed will be decoupled from data modification operation speed.

In case of write operations we, may delay a data write and try to convert it to a sequential IO operation, but if we need to read data we need it instantly and can not delay the data read from file. So in this case we use the well known technique of caching frequently used pages in read cache.

So taking all of the above into account, you can see that OrientDB uses 2 caches:

When we read a page from a file, the following steps are performed:

When we modified the page content, it is automatically placed in the write cache.

There is one big problem with all those caches. Such system is not durable. If the application crashes, then some data which have not yet been written to the disk will be lost.

To avoid this kind of problems we use a database journal, aka WAL (write ahead log). This makes the whole process of writing of data a bit complex. When we modify the page we do not put the page in the write cache.  Instead, we write the difference of the page content into map keys which consist of a file and index of the changed page and values which contain diff of changes between original and changed page.

When an operation on a cluster or index is completed without exceptions we extract all changes from the map and log them inside the database journal and only after that do we apply those changes to the file pages and put them in the write cache. The database journal may be treated as an “append only” log so all write operations to the database journal are sequential and as result are fast. The process of writing changes to the database journal and applying them to the “real” file page is called “atomic operation commit”.

What value does the database journal give to us ?

In both cases data consistency will not be compromised.

Taking all of above into account you probably have already concluded that the main performance characteristics of OrientDB’s storage engine (and not only OrientDB)  are:

All those numbers will show us the direction our project must evolve towards. For example, if we have good numbers for a disk cache hit rate and very few pages are read for a single component operation, we have to improve disk cache speed as a whole.  However if we have a lot of page reads for a single component operation and very low numbers for page read speeds, we need to minimize the amount of pages accessed for the single operation and convert data structures to ones which uses more sequential rather than random IO operations.

Readers may ask: “well, all of this is very good but how it is related to us?”

The answer is: when you report performance issues please provide a benchmark (though we all have different hardware and sometimes cannot simply reproduce your issue) but also provide performance numbers gathered as the results of storage profiling.

Readers might also ask: “How is that done?”

It may be done in 2 ways, by using JMX and by using SQL commands.

The JMX console provides numbers gathered from the execution of all operations in storage but SQL commands provide data which are gathered for a selected set of commands.

To gather performance for a selected set of commands you can execute a script such as the one shown below:

At the end of the script you will see the following result:

As you can see you may see numbers for storage performance as a whole and numbers for the performance of each component.

Data from the atomic operation commit phase is presented as data from the component named “atomic operation”.

If you work with an embedded database you can start and stop storage profiling by calling the following methods:

OAbstractPaginatedStorage#startGatheringPerformanceStatisticForCurrentThread() to start storage profiling and OAbstractPaginatedStorage#completeGatheringPerformanceStatisticForCurrentThread() to stop profiling of storage.

You may also connect to the JMX server and read current performance numbers from MBean with name:

com.orientechnologies.orient.core.storage.impl.local.statistic:type=OStoragePerformanceStatisticMXBean,name=,id=

Hope it will be interesting for you to read overview of OrientDB architecture and performance characteristics which are important for us. Please do not forget to send results of profiling together with performance reports.

If you have any questions about this blog entry or about any of OrientDB’s features please post your question on stackoverflow and we will answer it.

OrientDB Takes Off in 2015 and Announces Remarkable Growth

 

Key highlights include senior management hires, community awards, coverage by leading analysts, record subscription growth and breakthrough product innovation

London, UK (February 2, 2016) – OrientDB Ltd, the company behind the first-ever distributed multi-model database, announces a year of record growth that paves the way for a transformative 2016.

Leading analysts continue to highlight the surge of multi-model databases and the tremendous opportunity for customers to accelerate the pace of innovation by using a few operational databases rather than many different technologies. Natively supporting graphs, documents and the familiar SQL dialect, OrientDB is a general-purpose solution to naturally process today’s data which is generated at unbelievable speed. That opens the doors to a new class of applications, drastically reduces the costs and removes the need to keep multiple DBMSes aligned.

“With the release of OrientDB 2.1 and breakthrough innovation coming in OrientDB 2.2, we are making the industry’s first distributed document-graph database even better.” said Luca Garulli, CEO and Founder. “When I started to work on OrientDB back in 2010, I could only imagine the pace of growth we’re seeing today thanks to our customers, users and advocates.”

Product and Ecosystem Milestones

 

“The multi-model market started to blossom in 2015 as enterprises shift towards a simplified architecture that marries connected, unstructured and structured data processed at a speed never possible before,” said Luca Olivari, President. “OrientDB pioneered the multi-model approach and is best positioned to capitalize on this massive change.”

Company Momentum

About OrientDB

With downloads exceeding 70,000 per month, more than 100 community contributors and 1000’s of production users, OrientDB is experiencing tremendous growth in both community and Enterprise adoption. The native multi-model database combines the connectedness of graphs, the agility of documents and the familiar SQL dialect. Fortune 500 companies, government entities and startups all use the technology to build large-scale innovative applications. Some of their clients include Ericsson, the United Nations, Pitney Bowes, Sky, CenturyLink and Sonatype. OrientDB recently won the prestigious 2015 Infoworld Bossie award.

Resources:

 

This is a guest post by OrientDB contributor Matan Shukry.

Hi,

My name is Matan Shukry, and I’m a Programmer, DBA, and a Big Data engineer.

Today I’ll talk about my contribution to OrientDB, with emphasis on sequences.

The concept of sequences should be familiar to most people who used an RDBMS before. However, for those of you who aren’t familiar with it, I’ll give a short description on the topic.

Sequence is a database object that generates numbers sequentially. It is mostly used for automatically incremented columns.

Sounds simple, right? Well, here comes the tricky part:

    1. Sequences do not necessarily generate numbers in an ordered fashion. Assuming A and B are retrieve operations, where B happens after A, A may result in a number that is higher than B.
    2. Sequences do not necessarily generate numbers in a continuous fashion. Assuming A and B are retrieve operations, where B happens after A, the difference between the result of B and A may be bigger than 1. That is, there may be “holes” between sequence values.

 

Both of the above points happen due to a caching mechanism, where a range of numbers are kept in memory, and when requested are provided to the user. However, in some cases such as a transaction rollback or a server shutdown, the numbers are lost. Furthermore, in many cases there is also an option in the sequence to turn off caching in order to provide a sequence that generates ordered and continuous numbers.

Starting from version 2.2, a sequence object has been introduced to OrientDB. It contains two types (ordered and cached), and include ‘start’ and ‘increment’ fields.

The sequence object in OrientDB uses optimistic transaction (MVCC). When the sequence needs to allocate more numbers (either a range of them with cached sequence or a single one with ordered sequence), it will retrieve the document, change it’s properties, and attempt to save it (commit). If the sequence document is too old, meaning another connection changed the document and committed it in between our retrieve-and-save, the sequence will attempt to retry the operation again. If the operation fails a certain number of retries, an exception is thrown back to the user, after which the user decides what to do next. The entire process happens at the database layer, and it’s very quickly.

Also, an automatically-increment column type (which will rely on the sequence object, together with the default value feature) will probably be added in the near/distant future. As you probably figured out by now, this will result in inserting an automatically-incremented number into a specific column. 

Here are a few examples on how to use sequences. Consider a blog site where we would like each post to have a unique id. We would create the sequence as follows:

SQL

CREATE SEQUENCE postId TYPE CACHED START 101 INCREMENT 2 CACHE 20

Java

OSequence seq = database.getMetadata().getSequenceLibrary().createSequence(“postId”, OSequence.SEQUENCE_TYPE.CACHED, new OSequence.CreateParams().setStart(101).setIncrement(2).setCacheSize(20));

Each time we’ll want to insert a new post, we’ll use .next():

SQL

INSERT INTO Post SET id = sequence(“postId”).next(), title=”BTE – Best Title Ever”, body=”…”

Java (Graph API)

OSequence seq = graphDB.getRawGraph().getMetadata().getSequenceLibrary().getSequence(“postId”);
graphDB.addVertex(“class:Post”,
“id”, seq.next(), “title”, “BTE – Best Title Ever”, “body”, “…”);

Java (Document API)

OSequence seq = database.getMetadata().getSequenceLibrary().getSequence(“postId”); 
ODocument doc = new ODocument(“Post”); 
doc.fields(“id”, seq.next(), “title”, “BTE – Best Title Ever”, “body”, “…”); 
doc.save();

 

You can also change the sequence parameters (alter): 

SQL

ALTER SEQUENCE postId START 1001 INCREMENT 30 CACHE 40

Java

database.getMetadata().getSequenceLibrary().getSequence(“postId”).updateParams(new OSequence.CreateParams().setStart(1001).setIncrement(30).setCache(40));

 

If at some point we would like to retrieve the current value without incrementing it or reset it back to 0 (probably when playing around in your development environment):

SQL

SELECT sequence(“userId”).current()
SELECT sequence(“userId”).reset()

Java

OSequence seq = database.getMetadata().getSequenceLibrary().getSequence(“postId”); 
long value = seq.current(); 
seq.reset();

P.S.

There is a workaround in order to create an auto-increment fields in previous versions of OrientDB (<v2.2). Check out this page for more information.

 

Hope this comes in handy,

Matan Shukry
 _

We’re constantly striving to improve the user experience and simplify OrientDB. Starting with OrientDB 2.1.x, we moved some monitoring features that were previously only available with the OrientDB Workbench to OrientDB Studio. Enterprise Edition is still needed, as it comes with the Enterprise Profiler that enables the monitoring features. Going forward, our plan is to move most of the features available in the OrientDB Workbench to OrientDB Studio. In this way, users will need only one tool (instead of two separate products) to query, manage and monitor OrientDB instances.

Getting Started

To download OrientDB Enterprise Edition, which is free for development, click here. Once you’ve filled out the form and received the download link, you can start OrientDB server by following the documentation.

OrientDB Studio is bundled with the distribution and can be accessed with a browse. If you launch OrientDB on your local machine, Studio is available here: http://localhost:2480.

OrientDB Studio Features

Now let’s see which features are available with OrientDB 2.1.x. When Studio detects that the agent is installed on OrientDB, it will automatically enable the following features:

 

Server Statistics

The new monitoring dashboard provides a quick overview on the status of OrientDB server instances. Some metrics are available here, such as :

studio-singleServer

A realtime chart is also available, displaying the CRUD operations done by the monitored server.

studio-chart

If the OrientDB server is running in distributed mode and is a member of an OrientDB cluster, Studio will display monitoring information for all nodes connected to the cluster. Each node will publish their corresponding metrics to the Hazelcast cluster and make them available to the other nodes.

studio-multipleServers
SQL Profiler

Another cool feature available with OrientDB 2.1.x is the SQL Profiler. The Enterprise agent records and collects all the commands executed by the OrientDB server instance, and makes them available to Studio/Workbench.

For each command, the agent tracks:

studio-sqlProfiler


With this information, we can easily identify which queries are executed more often and which queries perform worse or better in terms of execution time.


What’s Next?

Server statistics and SQL profiler are the first features that we implemented in OrientDB Studio 2.1.x in order to provide a single tool for managing data and monitoring OrientDB server instances. We will provide more features in the next releases that are now only available with the OrientDB Workbench.

At the top of our list is a new cluster management feature that allows users to change the distributed configuration without restarting the OrientDB server instance.

References:


Stay tuned,

Enrico Risa
Lead Enterprise Engineer

 

OrientDB Wins InfoWorld Bossie Award

InfoWorld recognizes OrientDB as one of the best open source infrastructure and management software platforms

London, UK (Sept 21, 2015) – Orient Technologies, the company behind OrientDB (www.orientdb.com), the graph-document database that pioneered the multi-model concept, announces that OrientDB is a winner of the prestigious InfoWorld 2015 Bossie Award.

InfoWorld editors and contributors pick the top open source software for data centers, clouds, developers, data crunchers, and IT professionals. They recognized OrientDB as having one of the best open source application development tools.

Steve Nunez explains: “OrientDB is an interesting hybrid in the NoSQL world, combining features from a document database, where individual documents can have multiple fields without necessarily defining a schema, and a graph database, which consists of a set of nodes and edges. At a basic level, OrientDB considers the document as a vertex, and relationships between fields as graph edges. Because the relationships between elements are part of the record, no costly joins are required when querying data.”

“It’s great to see OrientDB receiving the InfoWorld Bossie Award, as we’re more committed than ever to the open source and database communities.” said Luca Olivari, President of Orient Technologies. “We are seeing tremendous adoption and customer growth and, together with our ecosystem, are fulfilling our vision to become the operational data store of the modern enterprise.”

About OrientDB

With downloads exceeding 60,000/month, more than 100 contributors and 1000’s of production users, OrientDB is the leading next generation native multi-model enterprise database combining the connectedness of graphs, the agility of documents and the familiar SQL dialect. Fortune 500 companies, government entities and startups use the technology to build large-scale innovative applications. Some of their clients include Ericsson, United Nations, Pitney Bowes, Sky, CenturyLink and Sonatype.

Orient Technologies is the main sponsor and the commercial supporter of OrientDB.

Resources:

 

 

London, June 17th 2015

After less then one month from the last RC3, the OrientDB Team released OrientDB 2.1-rc4. 45 total issues are resolved by 2.1-rc4, this is the complete list of issues.

We suggest to use OrientDB 2.1-rc4 only if you are in development. Instead, users in production should wait 2.1 GA for the upgrade. If no critical issues will be raised in the next days, OrientDB 2.1 GA is scheduled for June 23, 2015.

Download it now: https://orientdb.com/download.

Thanks to all the contributors that worked hard on this release, providing pull requests, tests, issues and comments.

Best Regards,

Luca Garulli
CEO at Orient Technologies LTD
the Company behind OrientDB
http://about.me/luca.garulli

 

Take your enterprise to the next level with OrientDB