ETL Transformers

When OrientDB runs the ETL module, transformer components execute in a pipeline to modify the data before it gets loaded into the OrientDB database. The operate on received input and return output.

Before execution, it always initializes the $input variable, so that if you need to you can access it at run-time.

CSV Transformer

Beginning with version 2.1.4, the CSV Transformer has been deprecated in favor of the CSV Extractor.

Converts a string in a Document, parsing it as CSV

Component description.

  • Component name: csv
  • Supported inputs types: [String]
  • Output: ODocument

Syntax

ParameterDescriptionTypeMandatoryDefault value
"separator"Defines the column separator.char,
"columnsOnFirstLine"Defines whether the first line contains column descriptions.booleantrue
"columns"Defines array containing column names, you can define types by postfixing the names with :<type>.string array
"nullValue"Defines the value to interpret as null.string
"stringCharacter"Defines string character delimiter.char"
"skipFrom"Defines the line number to skip from.integeryes
"skipTo"Defines the line number to skip to.integeryes

For the "columns" parameter, specifying type guarantees better performance.

Example

  • Transforms a row in CSV (as ODocument class), using commas as the separator, considering NULL as a null value and skipping rows two through four.

    { "csv": { "separator": ",", "nullValue": "NULL",
               "skipFrom": 1, "skipTo": 3 } }
    

Field Transformer

When the ETL module calls the Field Transformer, it executes an SQL transformer against the field.

Component description.

  • Component name: vertex
  • Supported inputs types: [ODocument]
  • Output: ODocument

Syntax

ParameterDescriptionTypeMandatoryDefault value
"fieldName"Defines the document field name to use.string
"expression"Defines the expression you want to evaluate, using OrientDB SQL.stringyes
"value"Defines the value to set. If the value is taken or computed at run-time, use "expression" instead.any
"operation"Defines the operation to execute against the fields: SET or REMOVE.stringSET
"save"Defines whether to save the vertex, edge or document right after setting the fields.booleanfalse

The "fieldName" parameter was introduced in version 2.1.

Examples

  • Transforms the field class into the ODocument class, by prefixing it with _:

    { "field": 
      { "fieldName": "@class", 
        "expression": "class.prefix('_')"
      } 
    }
    
  • Applies the class name, based on the value of another field:

    { "field": 
      { "fieldName": "@class", 
        "expression": "if( ( fileCount >= 0 ), 'D', 'F')"
      }
    }
    
  • Assigns the last part of a path to the name field:

    { "field": 
      { "fieldName": "name",
        "expression": "path.substring( eval( '$current.path.lastIndexOf(\"/\") + 1') )" 
      }
    }
    
  • Assigns the field a fixed value:

    { "field": 
      { "fieldName": "counter", 
        "value": 0
      }
    }
    
  • Renames the field from salary to remuneration:

    { "field": 
      { "fieldName": "remuneration", 
        "expression": "salary"
      } 
    },
    { "field": 
      { "fieldName": "salary", 
        "operation": "remove"
      } 
    }
    
  • Renames multiple fields in one call.

    { "field": 
      { "fieldNames": 
        [ "remuneration", "salary" ], 
        "operation": "remove"
      } 
    }
    

    This feature was introduced in version 2.1.

Load Transformer

When the ETL module calls the Load Transformer, it takes input from one ODocument instance to output into another, loaded by lookup. THe lookup can either be a lookup against an index or a SELECT query.

Component description.

  • Component name: load
  • Supported inputs types: [ODocument, OrientVertex]
  • Output: one or multiple ODocument instances

Syntax

ParameterDescriptionTypeMandatoryDefault value
"joinFieldName"Defines the field containing the join value.stringyes
"lookup"Defines the index on which to execute the lookup, or a SELECT query.stringyes
"unresolvedLinkAction"Defines the action to execute in the event that the join hasn't been resolved.stringNOTHING

For the "unresolvedLinkAction" parameter, the supported actions are:

ActionDescription
NOTHINGTells the transformer to do nothing.
WARNINGTells the transformer to increment warnings.
ERRORTells the transformer to increment errors.
HALTTells the transformer to interrupt the process.
SKIPTells the transformer to skip the current row.

Example

  • Loads the current record against the record returned by the lookup on index V.URI, with the value contained in the field URI of the input document:

    { "load": 
      { "joinFieldName": "URI", 
        "lookup":"V.URI" 
      } 
    }
    

Merge Transformer

When the ETL module calls the Merge Transformer, it takes input from one ODocument instance to output into another, loaded by lookup. THe lookup can either be a lookup against an index or a SELECT query.

Component description.

  • Component name: merge
  • Supported inputs types: [ODocument, OrientVertex]
  • Output: ODocument

Syntax

ParameterDescriptionTypeMandatoryDefault value
"joinFieldName"Defines the field containing the join value.stringyes
"lookup"Defines the index on which to execute the lookup, or a SELECT query.stringyes
"unresolvedLinkAction"Defines the action to execute in the event that the join hasn't been resolved.stringNOTHING

For the "unresolvedLinkAction" parameter, the supported actions are:

ActionDescription
NOTHINGTells the transformer to do nothing.
WARNINGTells the transformer to increment warnings.
ERRORTells the transformer to increment errors.
HALTTells the transformer to interrupt the process.
SKIPTells the transformer to skip the current row.

Example

  • Merges the current record against the record returned by the lookup on index V.URI, with the value contained in the field URI of the input document:

    { "merge": 
      { "joinFieldName": "URI", 
        "lookup":"V.URI" 
      } 
    }
    

Vertex Transformer

When the ETL module runs the Vertex Transformer, it transforms ODocument input to output OrientVertex.

Component description.

  • Component name: vertex
  • Supported inputs types: [ODocument, OrientVertex]
  • Output: OrientVertex

Syntax

ParameterDescriptionTypeMandatoryDefault value
"class"Defines the vertex class to use.stringV
"skipDuplicates"Defines whether it skips duplicates. When class has a UNIQUE constraint, ETL ignores duplicates.booleanfalse

The "skipDuplicates" parameter was introduced in version 2.1.

Example

  • Transforms ODocument input into a vertex, setting the class value to the $classname variable:

    { "vertex": 
      { "class": "$className", 
        "skipDuplicates": true 
      } 
    }
    

Edge Transformer

When the ETL modules calls the Edge Transformer, it converts join values in one or more edges between the current vertex and all vertices returned by the lookup. The lookup can either be made against an index or a SELECT.

Component description.

  • Component name: EDGE
  • Supported inputs types: [ODocument, OrientVertex]
  • Output: OrientVertex

Syntax

ParameterDescriptionTypeMandatoryDefault value
"joinFieldName"Defines the field containing the join value.stringyes
"direction"Defines the edge direction.stringout
"class"Defines the edge class.stringE
"lookup"Defines the index on which to execute the lookup or a SELECT.stringyes
"targetVertexFields"Defines the field on which to set the target vertex.object
"edgeFields"Defines the fields to set in the edge.object
"skipDuplicates"Defines whether to skip duplicate edges when the UNIQUE constraint is set on both the out and in properties.booleanfalse
"unresolvedLinkAction"Defines the action to execute in the event that the join hasn't been resolved.stringNOTHING

The "targetVertexFields" and "edgeFields" parameters were introduced in version 2.1.

For the "unresolvedLinkAction" parameter, the following actions are supported:

ActionDescription
NOTHINGTells the transformer to do nothing.
CREATETells the transformer to create an instance of OrientVertex, setting the primary key to the join value.
WARNINGTells the transformer to increment warnings.
ERRORTells the transformer to increment errors.
HALTTells the transformer to interrupt the process.
SKIPTells the transformer to skip the current row.

Examples

  • Creates an edge from the current vertex, with the class set to Parent, to all vertices returned by the lookup on the D.inode index with the value contained in the filed inode_parent of the input's vertex:

    { "edge": 
      { "class": "Parent", 
        "joinFieldName": "inode_parent",
        "lookup":"D.inode",    
        "unresolvedLinkAction": "CREATE"
      } 
    }
    
  • Transformer a single-line CSV that contains both vertices and edges:

    { "source": 
      { "content": 
        { "value": "id,name,surname,friendSince,friendId,friendName,friendSurname\n0,Jay,Miner,1996,1,Luca,Garulli"
        }
      },
      "extractor": 
        { "row": {} },
      "transformers": 
        [ 
          { "csv": {} },
          { "vertex": 
      	  { "class": "V1" }
          },
          { "edge": 
      	  { "unresolvedLinkAction": "CREATE",
              "class": "Friend",
              "joinFieldName": "friendId",
              "lookup": "V2.fid",
              "targetVertexFields": 
      		  { "name": "${input.friendName}",
                   "surname": "${input.friendSurname}"
                },
                "edgeFields": 
      		    { "since": "${input.friendSince}" }
                  }
            },
            { "field": 
      	    { "fieldNames": 
      		  [ "friendSince",
                  "friendId",
                  "friendName",
                  "friendSurname"
                ],
                "operation": "remove"
              }
            }
          ],
          "loader": 
      	  { "orientdb": 
      	    { "dbURL": "memory:ETLBaseTest",
                "dbType": "graph",
                "useLightweightEdges": false
              }
          }
      }
    

Flow Transformer

When the ETL module calls the Flow Transformer, it modifies the flow through the pipeline. Supported operations are skip and halt. Typically, this transformer operates with the if attribute.

Component description.

  • Component name: flow
  • Supported inputs types: Any
  • Output: same type as input

Syntax

ParameterDescriptionTypeMandatoryDefault value
"operation"Defines the flow operation: skip or halt.stringyes

Example

  • Skips the current record if name is null:

    { "flow": 
      { "if": "name is null", 
        "operation" : "skip" 
      } 
    }
    

Code Transformer

When the ETL module calls the Code Transformer, it executes a snippet of code in any JVM supported language. The default is JavaScript. The last object in the code is returned as output.

In the execution context:

  • input The input object received.
  • record The record extracted from the input object, when possible. In the event that input object is a vertex or edge, it assigns the underlying ODocument to the variable.

Component description.

  • Component name: code
  • Supported inputs types: [Object]
  • Output: Object

Syntax

| Parameter | Description | Type | Mandatory | Default value | |-----------|-------------|-----------|-----------| | "language" | Defines the programming language to use. | string | | JavaScript | | "code" | Defines the code to execute. | string | yes | |

Example

  • Displays the current record and return the parent:

    { "code": 
      { "language": "Javascript",
        "code": "print('Current record: ' + record); record.field('parent');"
      }
    }
    

When the ETL module calls the Link Transformer, it converts join values into links within the current record, using the result of the lookup. The lookup can be made against an index or a SELECT.

Component description.

  • Component name: link
  • Supported inputs types: [ODocument, OrientVertex]
  • Output: ODocument

Syntax

ParameterDescriptionTypeMandatoryDefault value
"joinFieldName"Defines the field containing hte join value.string
"joinValue"Defines the value to look up.string
"linkFieldName"Defines the field containing the link to set.stringyes
"linkFieldType"Defines the link type.stringyes
"lookup"Defines the index on which to execute the lookup or a SELECT query.stringyes
"unresolvedLinkAction"Defines the action to execute in the event that the join doesn't resolve.stringNOTHING

For the "linkFieldType" parameter, supported link types are: LINK, LINKSET and LINKLIST.

For the "unresolvedLinkAction" parameter the following actions are supported:

ActionDescription
NOTHINGTells the transformer to do nothing.
CREATETells the transformer to create an ODocument instance, setting the primary key as the join value.
WARNINGTells the transformer to increment warnings.
ERRORTells the transformer to increment errors.
HALTTells the transformer to interrupt the process.
SKIPTells the transformer to skip the current row.

Example

  • Transforms a JSON value into a link within the current record, set as parent of the type LINK, with the result of the lookup on the index D.node with the value contained in the field inode_parent on the input document.

    { "link": 
      { "linkFieldName": "parent", 
        "linkFieldType": "LINK",
        "joinFieldName": "inode_parent", 
        "lookup":"D.inode",  
        "unresolvedLinkAction":"CREATE"
      } 
    }
    

Log Transformer

When the ETL module uses the Log Transformer, it logs the input object to System.out.

Component description.

  • Component name: log
  • Supported inputs types: Any
  • Output: Any

Syntax

ParameterDescriptionTypeMandatoryDefault value
"prefix"Defines what it writes before the content.string
"postfix"Defines what it writes after the content.string

Examples

  • Logs the current value:

    { "log": {} }
    
  • Logs the current value with -> as the prefix:

    { "log": 
      { "prefix" : "-> " } 
    }
    

Block Transformer

When the ETL module calls the Block Transformer, it executes an ETL Block component as a transformation step.

Component description.

  • Component name: block
  • Supported inputs types: [Any]
  • Output: Any

Syntax

ParameterDescriptionTypeMandatoryDefault value
"block"Defines the block to execute.documentyes

Example

  • Log the current value:

    { "block": 
      { "let": 
        { "name": "id",
          "value": "={eval('$input.amount * 2')}"
        }
      }
    }
    

Command Transformer

When the ETL module calls the Command Transformer, it executes the given command.

Component description.

  • Component name: command
  • Supported inputs types: [ODocument]
  • Output: ODocument

Syntax

ParameterDescriptionTypeMandatoryDefault value
"language"Defines the command language: SQL or Gremlin.stringsql
"command"Defines the command to execute.stringyes

Example

  • Executes a SELECT and output an edge:

    { "command" : 
      { "command" : "SELECT FROM E WHERE id = ${input.edgeid}",
        "output" : "edge"
      }
    }