DataX NebulaGraphWriter

1 Quick Introduction

NebulaGraphWriter plugin implements the function of writing data to the NebulaGraph database graph space target label or edge type. In terms of the underlying implementation, NebulaGraphWriter connects to NebulaGraph through JDBC, executes the insert statement according to the nGql syntax of NebulaGraph, and writes data to NebulaGraph.

NebulaGraphWriter can be used as a data migration tool for DBAs to import relational database data into NebulaGraph, so as to realize the function of offline synchronization.

2 Implementation Principle

NebulaGraphWriter obtains the protocol data (Record format) generated by Reader through the DataX framework, connects to NebulaGraph through nebula-jdbc (JDBC driver), executes the insert statement, and writes the data to NebulaGraph.

In addition to using nebula-jdbc, it is also necessary to obtain system-level meta information on the NebulaGraph side through nebula-java, which is used to synchronize labels, edge types, and field matching.

3 Function Description

3.1 Configuration Example

Configure a job that writes to NebulaGraph, first create a graph space and labels on NebulaGraph:

CREATE SPACE IF NOT EXISTS cba(vid_type = FIXED_STRING(30));
CREATE TAG IF NOT EXISTS player(name string, age int);
CREATE EDGE IF NOT EXISTS follow(degree int);

Data generated from memory (streamreader) transmit into NebulaGraph.

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "column": [
              {
                "type": "string",
                "value": "zhangsan"
              },
              {
                "type": "long",
                "value": 25
              }
            ],
            "sliceRecordCount": 1
          }
        },
        "writer": {
          "name": "nebulagraphwriter",
          "parameter": {
            "username": "root",
            "password": "nebula",
            "column": [
              "name",
              "age"
            ],
            "connection": [
              {
                "table": [
                  "player"
                ],
                "edgeType": [
                  {
                    "srcTag": "player", "srcPrimaryKey": "srcPlayerName",
                    "dstTag": "player", "dstPrimaryKey": "dstPlayerName"
                  }
                ],
                "jdbcUrl": "jdbc:nebula://cba"
              }
            ],
            "batchSize": 100
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

3.2 Parameter Description

jdbcUrl
- Description: JDBC connection information of the target data source, please refer to the JDBC information of NebulaGraph: Use of the nebula-jdbc connector
- Required: Yes
- Default: None
username
- Description: database username
- Required: Yes
- Default: None
password
- Description: username and password
- Required: Yes
- Default: None
table
- Description: A collection of table names. The table concept of the graph database NebulaGraph in the DataX data synchronization context can be understood as labels and edge types. The table should contain all columns with heavy column parameters. Note that the primary key + table name on the reader side will be used as Used as the VID of the node in the label. The reader side needs to specify the primary key, otherwise the default first column field is the primary key.
- Required: Yes
- Default: None
column
- Description: A collection of fields to be synchronized. The order of the fields should be consistent with the order of the columns in the record, that is, it needs to correspond to the order and name of the column fields on the reader side.
- Required: Yes
- Default: None
edgeType
- Description: When you need to synchronize edge type data (that is, the edge table type on the reader side), you need to specify the srcTag and dstTag in edgeType to represent the start tag type and end tag type in the edge type, as well as the primary key in these two types , that is, the starting and ending foreign keys in the edge table to be synchronized.
- Required: No
- Default: None
batchSize
- Description: batchSize is the size of one record write, which is mainly used for buffering to prevent DataX from making too many IO requests to NebulaGraph and affecting synchronization performance.
- Required: No
- Default: 1

3.3 Type Conversion

DataX Internal Types	NebulaGraph Data Types
LONG	INT INT64 INT32 INT16 INT8
DOUBLE	FLOAT DOUBLE
STRING	FIXED_STRING(N) STRING
BOOLEAN	BOOL
BYTES	No corresponding data type
DATE	DATE TIME DATETIME (not currently supported, will be improved in the future)

3.4 Reference Example from Relational Database to NebulaGraph

Data Migration Example	Configuration Example
MySQL to NebulaGraph	Relational database MySQL to NebulaGraph point table->label
To be added

4. Performance report

4.1 Environment Preparation

4.1.1 Data Features

Create table statement:

A one-line record is something like:

4.1.2 Machine Parameters

The machine parameters for executing DataX are:
1. CPU:
2. mem:
3. net: Gigabit dual network card
4. disc: DataX data does not fall on the disk, and this item is not counted
NebulaGraph database machine parameters are:
1. CPU:
2. mem:
3. net: Gigabit dual network card
4. Disc:

4.1.3 DataX jvm parameters

-Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError

4.2 Test report

4.2.1 Single table test report

Channels	DataX Speed (Rec/s)	DataX Traffic (MB/s)	DataX Machine NIC Outgoing Traffic (MB/s)	DataX Machine Running Load	DB NIC Incoming Traffic (MB/s)	DB Running Load	DB TPS
1
4
8
16
32

illustrate:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nebulagraphwriter.md

nebulagraphwriter.md

DataX NebulaGraphWriter

1 Quick Introduction

2 Implementation Principle

3 Function Description

3.1 Configuration Example

3.2 Parameter Description

3.3 Type Conversion

3.4 Reference Example from Relational Database to NebulaGraph

4. Performance report

4.1 Environment Preparation

4.1.1 Data Features

4.1.2 Machine Parameters

4.1.3 DataX jvm parameters

4.2 Test report

4.2.1 Single table test report

4.2.4 Performance Test Summary

5 Constraints

FAQ

Files

nebulagraphwriter.md

Latest commit

History

nebulagraphwriter.md

File metadata and controls

DataX NebulaGraphWriter

1 Quick Introduction

2 Implementation Principle

3 Function Description

3.1 Configuration Example

3.2 Parameter Description

3.3 Type Conversion

3.4 Reference Example from Relational Database to NebulaGraph

4. Performance report

4.1 Environment Preparation

4.1.1 Data Features

4.1.2 Machine Parameters

4.1.3 DataX jvm parameters

4.2 Test report

4.2.1 Single table test report

4.2.4 Performance Test Summary

5 Constraints

FAQ