Skip to content

Latest commit

 

History

History
208 lines (151 loc) · 7.81 KB

nebulagraphwriter.md

File metadata and controls

208 lines (151 loc) · 7.81 KB

DataX NebulaGraphWriter

简体中文|English

1 Quick Introduction

NebulaGraphWriter plugin implements the function of writing data to the NebulaGraph database graph space target label or edge type. In terms of the underlying implementation, NebulaGraphWriter connects to NebulaGraph through JDBC, executes the insert statement according to the nGql syntax of NebulaGraph, and writes data to NebulaGraph.

NebulaGraphWriter can be used as a data migration tool for DBAs to import relational database data into NebulaGraph, so as to realize the function of offline synchronization.

2 Implementation Principle

NebulaGraphWriter obtains the protocol data (Record format) generated by Reader through the DataX framework, connects to NebulaGraph through nebula-jdbc (JDBC driver), executes the insert statement, and writes the data to NebulaGraph.

In addition to using nebula-jdbc, it is also necessary to obtain system-level meta information on the NebulaGraph side through nebula-java, which is used to synchronize labels, edge types, and field matching.

3 Function Description

3.1 Configuration Example

  • Configure a job that writes to NebulaGraph, first create a graph space and labels on NebulaGraph:
CREATE SPACE IF NOT EXISTS cba(vid_type = FIXED_STRING(30));
CREATE TAG IF NOT EXISTS player(name string, age int);
CREATE EDGE IF NOT EXISTS follow(degree int);
  • Data generated from memory (streamreader) transmit into NebulaGraph.
{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "column": [
              {
                "type": "string",
                "value": "zhangsan"
              },
              {
                "type": "long",
                "value": 25
              }
            ],
            "sliceRecordCount": 1
          }
        },
        "writer": {
          "name": "nebulagraphwriter",
          "parameter": {
            "username": "root",
            "password": "nebula",
            "column": [
              "name",
              "age"
            ],
            "connection": [
              {
                "table": [
                  "player"
                ],
                "edgeType": [
                  {
                    "srcTag": "player", "srcPrimaryKey": "srcPlayerName",
                    "dstTag": "player", "dstPrimaryKey": "dstPlayerName"
                  }
                ],
                "jdbcUrl": "jdbc:nebula://cba"
              }
            ],
            "batchSize": 100
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}

3.2 Parameter Description

  • jdbcUrl

    • Description: JDBC connection information of the target data source, please refer to the JDBC information of NebulaGraph: Use of the nebula-jdbc connector
    • Required: Yes
    • Default: None
  • username

    • Description: database username
    • Required: Yes
    • Default: None
  • password

    • Description: username and password

    • Required: Yes

    • Default: None

  • table

    • Description: A collection of table names. The table concept of the graph database NebulaGraph in the DataX data synchronization context can be understood as labels and edge types. The table should contain all columns with heavy column parameters. Note that the primary key + table name on the reader side will be used as Used as the VID of the node in the label. The reader side needs to specify the primary key, otherwise the default first column field is the primary key.
    • Required: Yes
    • Default: None
  • column

    • Description: A collection of fields to be synchronized. The order of the fields should be consistent with the order of the columns in the record, that is, it needs to correspond to the order and name of the column fields on the reader side.

    • Required: Yes

    • Default: None

  • edgeType

    • Description: When you need to synchronize edge type data (that is, the edge table type on the reader side), you need to specify the srcTag and dstTag in edgeType to represent the start tag type and end tag type in the edge type, as well as the primary key in these two types , that is, the starting and ending foreign keys in the edge table to be synchronized.
    • Required: No
    • Default: None
  • batchSize

    • Description: batchSize is the size of one record write, which is mainly used for buffering to prevent DataX from making too many IO requests to NebulaGraph and affecting synchronization performance.
    • Required: No
    • Default: 1

3.3 Type Conversion

DataX Internal Types NebulaGraph Data Types
LONG INT INT64 INT32 INT16 INT8
DOUBLE FLOAT DOUBLE
STRING FIXED_STRING(N) STRING
BOOLEAN BOOL
BYTES No corresponding data type
DATE DATE TIME DATETIME (not currently supported, will be improved in the future)

3.4 Reference Example from Relational Database to NebulaGraph

Data Migration Example Configuration Example
MySQL to NebulaGraph Relational database MySQL to NebulaGraph point table->label
To be added

4. Performance report

4.1 Environment Preparation

4.1.1 Data Features

Create table statement:

A one-line record is something like:

4.1.2 Machine Parameters

  • The machine parameters for executing DataX are:
    1. CPU:
    2. mem:
    3. net: Gigabit dual network card
    4. disc: DataX data does not fall on the disk, and this item is not counted
  • NebulaGraph database machine parameters are:
    1. CPU:
    2. mem:
    3. net: Gigabit dual network card
    4. Disc:

4.1.3 DataX jvm parameters

-Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError

4.2 Test report

4.2.1 Single table test report

Channels DataX Speed (Rec/s) DataX Traffic (MB/s) DataX Machine NIC Outgoing Traffic (MB/s) DataX Machine Running Load DB NIC Incoming Traffic (MB/s) DB Running Load DB TPS
1
4
8
16
32

illustrate:

4.2.4 Performance Test Summary

5 Constraints

FAQ