-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support MERGE on cloned table in Delta Lake #24756
base: master
Are you sure you want to change the base?
Conversation
37ff93a
to
60e9e73
Compare
60e9e73
to
6432fab
Compare
6432fab
to
9bc68fc
Compare
9bc68fc
to
0cb664f
Compare
row(2, "B", Date.valueOf("2024-01-01")), | ||
row(3, "xxx", Date.valueOf("2024-02-02")), | ||
row(4, "D", Date.valueOf("2024-02-02"))); | ||
assertThat(onTrino().executeQuery("SELECT * FROM delta.default." + clonedTable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check onDelta()
for base and cloned table after UPADTE
row(2, "B", Date.valueOf("2024-01-01")), | ||
row(3, "yyy", Date.valueOf("2024-02-02")), | ||
row(5, "kkk", Date.valueOf("2025-03-03"))); | ||
assertThat(onTrino().executeQuery("SELECT * FROM delta.default." + clonedTable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check that base table did not change and similar checks for onDelta()
@@ -250,6 +251,75 @@ public void testReadFromSchemaChangedDeepCloneTable() | |||
testReadSchemaChangedCloneTable("DEEP", false); | |||
} | |||
|
|||
@Test(groups = {DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS}) | |||
public void testShallowCloneTableMerge() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have similar test for partitioned table and table with deletion vectors
@@ -420,14 +424,14 @@ private Slice writeDeletionVector( | |||
|
|||
try { | |||
DataFileInfo newFileInfo = new DataFileInfo( | |||
sourceRelativePath, | |||
sourceReferencePath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cloned path used to write new deletion vector files?
} | ||
finally { | ||
onTrino().executeQuery("DROP TABLE IF EXISTS delta.default." + baseTable); | ||
onTrino().executeQuery("DROP TABLE IF EXISTS delta.default." + clonedTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check that after doping cloned table data can be accessed from baseTable
|
||
Location targetLocation = sourceLocation.sibling(session.getQueryId() + "_" + randomUUID()); | ||
String targetRelativePath = relativePath(tablePath, targetLocation.toString()); | ||
String targetReferencePath = getReferencedPath(tablePath, targetLocation.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this rewrite file in base table in case of CLONE?
throw new RuntimeException(e); | ||
} | ||
|
||
checkArgument(sourceTableName != null && sourceTableName.contains(".") && sourceTableName.split("\\.").length == 3, "Unexpected source table in operation_parameters: %s", sourceTableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add test with resources like: https://github.com/trinodb/trino/blob/6ae6646eacf50b440b83544fcdaf4ecda82a2127/plugin/trino-delta-lake/src/test/resources/deltalake/allow_column_defaults
to showcase how _delta_log
entries look like?
Description
Fix problem that fail update on cloned table, reproduce steps:
testing/bin/ptl env up --environment singlenode-delta-lake-oss
In Trino:
create schema delta.tiny with (location='s3://test-bucket/tiny/');
In Spark-sql:
CREATE TABLE tiny.t1 (id int, v string, part date) USING DELTA PARTITIONED BY (part);
In Trino:
insert into delta.tiny.t1 values (1, 'A', TIMESTAMP '2024-01-01'), (2, 'B', TIMESTAMP '2024-01-01'), (3, 'C', TIMESTAMP '2024-02-02'), (4, 'D', TIMESTAMP '2024-02-02');
In Spark-sql:
CREATE TABLE tiny.t1clone SHALLOW CLONE tiny.t1;
In Trino:
update delta.tiny.t1clone set v = 'update1' where id in (1,3);
It fails with:Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: