-
Notifications
You must be signed in to change notification settings - Fork 339
SPL:Data dump
Applications sometimes dump some data from one database to another, and the DUMP of database can be used if the data type and data structure are the same. A text file is typically used as a medium for a heterogeneous database, but the text file may lose the information of data type and the speed is slow.
SPL provides a binary data format for high-performance data dump, which can also support the situation where the data structure is not completely consistent.
There is an ORDERS table in the ORACLE database which needs be exported as SPL bin files. The SPL script is as follows:
A | B | |
---|---|---|
1 | =connect("oracle") | //connect to the database |
2 | =A1.query@x("SELECT ORDERKEY, ORDERDATE, CUSTKEY, PRICE, AMOUNT FROM ORDERS WHERE ORDERDATE>=DATE ’2020-01-01’") | //load the data in the table that needs to be exported and close the database connection |
3 | =file("orders.btx").export@b(A2) | //export the loaded data of A2 to a file |
It may be exported as a whole table, as partial columns, or as data that meets certain conditions, determined by the SQL statement in cell A2.
Bin files are exclusive SPL binary data files that automatically record the data type of the data value, and can be read much faster than TXT files without parsing the data type. Some information is lost when the data table is exported as a TXT file, such as long numeric text strings like ID number, which may be parsed into numeric values when imported.
The exported data is driven by the JDBC of database, and when the data is big, the JDBC performance issues are revealed with taking too much time. This is when the parallel selection of SPL can be used to improve the efficiency of loading data. For more information, see SPL: parallel data retrieval skill.
Use cursor to load data when the data to be exported is too big to be fully loaded into memory.
A | B | |
---|---|---|
1 | =connect("oracle") | //connect to the database |
2 | =A1.cursor@x("SELECT ORDERKEY, ORDERDATE, CUSTKEY, PRICE, AMOUNT FROM ORDERS WHERE ORDERDATE>=DATE ’2020-01-01’") | //create a cursor, the option @x means to close the database connection when the cursor is closed |
3 | =file("orders.btx").export@b(A2) | //export the data from the A2 cursor to a file |
Import the data of bin files into another database.
When the field structure in the file is completely identical to that in the data table, for example, import orders.btx into the ORDERS table in the database, and the SPL script is as follows:
A | B | |
---|---|---|
1 | =file("orders.btx").cursor@b() | //creates a cursor that reads data from the file |
2 | =connect("oracle") | //connect to the database |
3 | =A2.update@i(A1,ORDERS) | //insert the data of A1 into the table with the same field name as its name in the file |
4 | >A2.close() | //close the database connection |
The option @i in A3 means that all the data is newly added, and importing with INSERT will speed it up. If the data to be imported is duplicated by the primary key of the data table, it is modified by updating the original data. In this case, the update function with no option can be directly used to compare with the data in the table, and determine to import the data whether in INSERT or UPDATE mode.
Sometimes when importing data, we may need to copy the full table of data in the file after deleting the original data in the table, for example, copy the full table of customer.btx file to the CUSTOMER table, and the SPL script is as follows:
A | B | |
---|---|---|
1 | =file("customer.btx").cursor@b() | //creates a cursor to load data from the file |
2 | =connect("oracle") | //connect to the database |
3 | =A2.update@a(A1,CUSTOMER) | //the option @a means that the data in the CUSTOMER table will be deleted before execution |
4 | >A2.close() | //close the database connection |
Some fields in data table have been renamed in the file, or new fields are added. The SPL script reads as follows:
A | B | |
---|---|---|
1 | =file("orders.btx").cursor@b() | //creates a cursor to load data from the file |
2 | =connect("oracle") | //connect to the database |
3 | =A2.update@i(A1,ORDERS,ORDERKEY,ORDERDATE,CUSTID:CUSTKEY,PRICE,AMOUNT,REMARK:null) | //import the A1 data into the ORDERS table |
4 | >A2.close() | //close the database connection |
The names of ORDERKEY, ORDERDATE, PRICE, AMOUNT fields remain the same. The CUSTKEY field in the file is renamed to CUSTID, REMARK is an added field, which is assigned to null.
For more information, see the db.update function help in the SPL document.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code