ETL - Sources

When OrientDB executes the ETL module, source components define the source of the data you want to extract. In the case of some extractors like JDBCExtractor work without source, making this component optional. The ETL module in OrientDB supports the following types of sources:

File Sources

In the file source component, the variables represent a source file containing the data you want the ETL module to read. You can use text files or files compressed to tar.gz.

  • Component name: file

Syntax

ParameterDescriptionTypeMandatoryDefault value
"path"Defines the path to the filestringyes
"lock"Defines whether to lock the file during the extraction phase.booleanfalse
"encoding"Defines the encoding for the file.stringUTF-8

Examples

  • Extract data from the file at /tmp/actor.tar.gz:

    { 
       "file": { 
          "path": "/tmp/actor.tar.gz", 
      	"lock" : true , 
      	"encoding" : "UTF-8" 
       }
    }
    

Input Sources

In the input source component, the ETL module extracts data from console input. You may find this useful in cases where the ETL module operates in a pipe with other tools.

  • Component name: input

Syntax

oetl.sh "<input>"

Example

  • Cat a file, piping its output into the ETL module:

    $ cat /etc/csv | $ORIENTDB_HOME/bin/oetl.sh \
          "{transformers:[{csv:{}}]}"
    

HTTP Sources

In the HTTP source component, the ETL module extracts data from an HTTP address as source.

  • Component name: http

Syntax

ParameterDescriptionTypeMandatoryDefault value
"url"Defines the URL to look to for source data.stringyes
"method"Defines the HTTP method to use in extracting data. Supported methods are: GET, POST, PUT, DELETE, HEAD, OPTIONS, and TRACE.stringGET
"headers"Defines the request headers as an inner document key/value.document

Examples

  • Execute an HTTP request in a GET, setting the user agent in the header:

    { 
       "http": {
          "url": "http://ip.jsontest.com/",
          "method": "GET",
          "headers": {
             "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36"
          }
       }
    }