Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable merge write disposition for athena Iceberg #1315

Merged

Conversation

jorritsandbrink
Copy link
Collaborator

@jorritsandbrink jorritsandbrink commented May 3, 2024

Description

This PR enables the merge write disposition (both delete-insert and scd2 strategies) for the athena destination with iceberg table format.

for delete-insert:

  • uses regular tables instead of temporary tables because Athena does not support those
    • these tables are created in the staging schema
    • each run, the previous' run table(s) get dropped and new table(s) for that run are created—hence, a version of these tables will always be visible in the staging schema (not super elegant, but not a big problem either I'd say)

Related Issues

Additional Context

Discussion on how we end up using this approach on this PR: #1294

@jorritsandbrink jorritsandbrink self-assigned this May 3, 2024
@jorritsandbrink jorritsandbrink linked an issue May 3, 2024 that may be closed by this pull request
Copy link

netlify bot commented May 3, 2024

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 549a4cd
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/66351f1bca83a90008bbd2ad
😎 Deploy Preview https://deploy-preview-1315--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jorritsandbrink jorritsandbrink changed the title Enable delete-insert merge strategy for athena Iceberg Enable merge write disposition for athena Iceberg May 3, 2024
@jorritsandbrink jorritsandbrink requested review from rudolfix and sh-rp May 3, 2024 20:43
condition_columns,
)
# DROP needs backtick as escape identifier
sql.insert(0, f"""DROP TABLE IF EXISTS {temp_table_name.replace('"', '`')};""")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and there's PR (#998 ) that allows to escape for DDL and DML separately

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rudolfix rudolfix merged commit cd74a97 into devel May 6, 2024
49 of 50 checks passed
@rudolfix rudolfix deleted the 633-enable-delete-insert-merge-strategy-athena-iceberg branch May 6, 2024 11:43
The `merge` write disposition is supported for Athena when using iceberg tables.

> Note that:
> 1. there is a risk of tables ending up in inconsistent state in case a pipeline run fails mid flight, because Athena doesn't support transactions, and `dlt` uses multiple DELETE/UPDATE/INSERT statements to implement `merge`,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the merge syntax that is available in Athena?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full discussion is here: #1294.

TLDR: we're planning to add a new merge strategy based on the MERGE statement, but this isn't properly specced yet. We want consistent behavior accross different destinations, hence the implementation of delete-insert for athena now. MERGE-based merge for athena will come together with the other destinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

enable merge on ICEBERG via Athena
3 participants