Import from Neo4j using GraphML

This section describes the process of importing data from Neo4j to OrientDB using GraphML. For general information on the possible Neo4j to OrientDB migration strategies, please refer to the Import from Neo4j section.

Neo4j can export in GraphML, an XML-based file format for graphs. Given that OrientDB can read GraphML, you can use this file format to import data from Neo4j into OrientDB, using the Console or the Java API.

Note:

For large and complex datasets, the preferred way to migrate from Neo4j is using the Neo4j to OrientDB Importer.

Neo4j and Cypher are registered trademark of Neo Technology, Inc.

Exporting GraphML

In order to export data from Neo4j into GraphML, you need to install the Neo4j Shell Tools plugin. Once you have this package installed, you can use the export-graphml utility to export the database.

  1. Change into the Neo4j home directory:

    $ cd /path/to/neo4j-community-2.3.2
    
  2. Download the Neo4j Shell Tools:

    $ curl http://dist.neo4j.org/jexp/shell/neo4j-shell-tools_2.3.2.zip \
          -o neo4j-shell-tools.zip
    
  3. Unzip the neo4j-shell-tools.zip file into the lib directory:

    $ unzip neo4j-shell-tools.zip -d lib
    
  4. Restart the Neo4j Server. In the event that it's not running, start it:

    $ ./bin/neo4j restart
    
  5. Once you have Neo4j restarted with the Neo4j Shell Tools, launch the Neo4j Shell tool, located in the bin/ directory:

    $ ./bin/neo4j-shell
    Welcome to the Neo4j Shell! Enter 'help' for a list of commands
    NOTE: Remote Neo4j graph database service 'shell' at port 1337
    
    neo4j-sh (0)$
    
  6. Export the database into GraphML:

    neo4j-sh (0)$ export-graphml -t -o /tmp/out.graphml
    Wrote to GraphML-file /tmp/out.graphml 0. 100%: nodes = 302 rels = 834
    properties = 4221 time 59 sec total 59 sec
    

This exports the database to the path /tmp/out.graphml.

Importing GraphML

There are three methods available in importing the GraphML file into OrientDB: through the Console, through Gremlin or through the Java API.

Importing through the OrientDB Console

For more recent versions of OrientDB, you can import data from GraphML through the OrientDB Console. If you have version 2.0 or greater, this is the recommended method given that it can automatically translate the Neo4j labels into classes.

  1. Log into the OrientDB Console.

    $ $ORIENTDB_HOME/bin/console.sh
    
  2. In OrientDB, create a database to receive the import:

    orientdb> CREATE DATABASE PLOCAL:/tmp/db/test
    Creating database [plocal:/tmp/db/test] using the storage type [plocal]...
    Database created successfully.
    
    Current database is: plocal:/tmp/db/test
    
  3. Import the data from the GraphML file:

    orientdb {db=test}> IMPORT DATABASE /tmp/out.graphml
    
    Importing GRAPHML database database from /tmp/out.graphml...
    Transaction 8 has been committed in 12ms
    

This imports the Neo4j database into OrientDB on the test database.

Importing through the Gremlin Console

For older versions of OrientDB, you can import data from GraphML through the Gremlin Console. If you have a version 1.7 or earlier, this is the method to use. It is not recommended on more recent versions, given that it doesn't consider labels declared in Neo4j. In this case, everything imports as the base vertex and edge classes, (that is, V and E). This means that, after importing through Gremlin you need to refactor you graph elements to fit a more structured schema.

To import the GraphML file into OrientDB, complete the following steps:

  1. Launch the Gremlin Console:

    $ $ORIENTDB_HOME/bin/gremlin.sh
    
             \,,,/
             (o o)
    -----oOOo-(_)-oOOo-----
    
  2. From the Gremlin Console, create a new graph, specifying the path to your Graph database, (here /tmp/db/test):

    gremlin> g = new OrientGraph("plocal:/tmp/db/test");
    ==>orientgraph[plocal:/db/test]
    
  3. Load the GraphML file into the graph object (that is, g):

    gremlin> g.loadGraphML("/tmp/out.graphml");
    ==>null
    
  4. Exit the Gremlin Console:

    gremlin> quit
    

This imports the GraphML file into your OrientDB database.

Importing through the Java API

OrientDB Console calls the Java API. Using the Java API directly allows you greater control over the import process. For instance,

new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).inputGraph("/temp/neo4j.graphml");

This line imports the GraphML file into OrientDB.

Defining Custom Strategies

Beginning in version 2.1, OrientDB allows you to modify the import process through custom strategies for vertex and edge attributes. It supports the following strategies:

  • com.orientechnologies.orient.graph.graphml.OIgnoreGraphMLImportStrategy Defines attributes to ignore.
  • com.orientechnologies.orient.graph.graphml.ORenameGraphMLImportStrategy Defines attributes to rename.

Examples

  • Ignore the vertex attribute type:

    new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).defineVertexAttributeStrategy("__type__", new OIgnoreGraphMLImportStrategy()).inputGraph("/temp/neo4j.graphml");
    
  • Ignore the edge attribute weight:

    new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).defineEdgeAttributeStrategy("weight", new OIgnoreGraphMLImportStrategy()).inputGraph("/temp/neo4j.graphml");
    
  • Rename the vertex attribute type in just type:

    new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).defineVertexAttributeStrategy("__type__", new ORenameGraphMLImportStrategy("type")).inputGraph("/temp/neo4j.graphml");
    

Import Tips and Tricks

Dealing with Memory Issues

In the event that you experience memory issues while attempting to import from Neo4j, you might consider reducing the batch size. By default, the batch size is set to 1000. Smaller value causes OrientDB to process the import in smaller units.

  • Import with adjusted batch size through the Console:

    orientdb {db=test}> IMPORT DATABASE /tmp/out.graphml batchSize=100
    
  • Import with adjusted batch size through the Java API:

    new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).setBatchSize(100).inputGraph("/temp/neo4j.graphml");
    

Storing the Vertex ID's

By default, OrientDB updates the import to use its own ID's for vertices. If you want to preserve the original vertex ID's from Neo4j, use the storeVertexIds option.

  • Import with the original vertex ID's through the Console:

    orientdb {db=test}> IMPORT DATABASE /tmp/out.graphml storeVertexIds=true
    
  • Import with the original vertex ID's through the Java API:

    new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).setStoreVertexIds(true).inputGraph("/temp/neo4j.graphml");
    

Example

A complete example of a migration from Neo4j to OrientDB using the GraphML method can be found in the section Tutorial: Importing the movie Database from Neo4j.

results matching ""

    No results matching ""