简体中文|English
NebulaGraphWriter plugin implements the function of writing data to the NebulaGraph database graph space target label or edge type. In terms of the underlying implementation, NebulaGraphWriter connects to NebulaGraph through JDBC, executes the insert statement according to the nGql syntax of NebulaGraph, and writes data to NebulaGraph.
NebulaGraphWriter can be used as a data migration tool for DBAs to import relational database data into NebulaGraph, so as to realize the function of offline synchronization.
NebulaGraphWriter obtains the protocol data (Record format) generated by Reader through the DataX framework, connects to NebulaGraph through nebula-jdbc (JDBC driver), executes the insert statement, and writes the data to NebulaGraph.
In addition to using nebula-jdbc, it is also necessary to obtain system-level meta information on the NebulaGraph side through nebula-java, which is used to synchronize labels, edge types, and field matching.
- Configure a job that writes to NebulaGraph, first create a graph space and labels on NebulaGraph:
CREATE SPACE IF NOT EXISTS cba(vid_type = FIXED_STRING(30));
CREATE TAG IF NOT EXISTS player(name string, age int);
CREATE EDGE IF NOT EXISTS follow(degree int);
- Data generated from memory (streamreader) transmit into NebulaGraph.
{
"job": {
"content": [
{
"reader": {
"name": "streamreader",
"parameter": {
"column": [
{
"type": "string",
"value": "zhangsan"
},
{
"type": "long",
"value": 25
}
],
"sliceRecordCount": 1
}
},
"writer": {
"name": "nebulagraphwriter",
"parameter": {
"username": "root",
"password": "nebula",
"column": [
"name",
"age"
],
"connection": [
{
"table": [
"player"
],
"edgeType": [
{
"srcTag": "player", "srcPrimaryKey": "srcPlayerName",
"dstTag": "player", "dstPrimaryKey": "dstPlayerName"
}
],
"jdbcUrl": "jdbc:nebula://cba"
}
],
"batchSize": 100
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
-
jdbcUrl
- Description: JDBC connection information of the target data source, please refer to the JDBC information of NebulaGraph: Use of the nebula-jdbc connector
- Required: Yes
- Default: None
-
username
- Description: database username
- Required: Yes
- Default: None
-
password
-
Description: username and password
-
Required: Yes
-
Default: None
-
-
table
- Description: A collection of table names. The table concept of the graph database NebulaGraph in the DataX data synchronization context can be understood as labels and edge types. The table should contain all columns with heavy column parameters. Note that the primary key + table name on the reader side will be used as Used as the VID of the node in the label. The reader side needs to specify the primary key, otherwise the default first column field is the primary key.
- Required: Yes
- Default: None
-
column
-
Description: A collection of fields to be synchronized. The order of the fields should be consistent with the order of the columns in the record, that is, it needs to correspond to the order and name of the column fields on the reader side.
-
Required: Yes
-
Default: None
-
-
edgeType
- Description: When you need to synchronize edge type data (that is, the edge table type on the reader side), you need to specify the srcTag and dstTag in edgeType to represent the start tag type and end tag type in the edge type, as well as the primary key in these two types , that is, the starting and ending foreign keys in the edge table to be synchronized.
- Required: No
- Default: None
-
batchSize
- Description: batchSize is the size of one record write, which is mainly used for buffering to prevent DataX from making too many IO requests to NebulaGraph and affecting synchronization performance.
- Required: No
- Default: 1
DataX Internal Types | NebulaGraph Data Types |
---|---|
LONG | INT INT64 INT32 INT16 INT8 |
DOUBLE | FLOAT DOUBLE |
STRING | FIXED_STRING(N) STRING |
BOOLEAN | BOOL |
BYTES | No corresponding data type |
DATE | DATE TIME DATETIME (not currently supported, will be improved in the future) |
Data Migration Example | Configuration Example |
---|---|
MySQL to NebulaGraph | Relational database MySQL to NebulaGraph point table->label |
To be added |
Create table statement:
A one-line record is something like:
- The machine parameters for executing DataX are:
- CPU:
- mem:
- net: Gigabit dual network card
- disc: DataX data does not fall on the disk, and this item is not counted
- NebulaGraph database machine parameters are:
- CPU:
- mem:
- net: Gigabit dual network card
- Disc:
-Xms1024m -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError
Channels | DataX Speed (Rec/s) | DataX Traffic (MB/s) | DataX Machine NIC Outgoing Traffic (MB/s) | DataX Machine Running Load | DB NIC Incoming Traffic (MB/s) | DB Running Load | DB TPS |
---|---|---|---|---|---|---|---|
1 | |||||||
4 | |||||||
8 | |||||||
16 | |||||||
32 |
illustrate: