Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg/sink: Iceberg Sink with Snapshot for Append-Only data #199

Open
laskoviymishka opened this issue Feb 4, 2025 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@laskoviymishka
Copy link
Contributor

laskoviymishka commented Feb 4, 2025

Implement Iceberg Sink with Snapshot and Append-Only Support

Feature Request

Develop an Iceberg sink, focusing on writing data into Iceberg tables with snapshot isolation and append-only mode. This implementation will not include replication or upserts in its initial scope.
Inspired by Databricks Iceberg Kafka Connect and kafka-delta-ingest for architecture and implementation insights.


Scope and Goals

Support appending new data files to Iceberg tables.
Ensure snapshot isolation when writing new data.
Use the Iceberg commit protocol to guarantee atomic writes.
Implement Parquet file writing as the primary format.
Incorporate basic data transformations during ingestion.
Implement metrics reporting for monitoring purposes.

🚫 Replication and CDC handling are out of scope.
🚫 No upsert or merge-on-read logic in this phase.


Implementation Details

  • Commit Model:

    • Each write operation should create a new snapshot in Iceber
    • Transactions should ensure atomic addition of new data file
  • Data Format & Partitioning:

    • Support Parquet format for writing data
    • Implement basic partitioning support based on Iceberg schem
  • Data Transformation:

    • Incorporate simple transformations during ingestion, such as deriving partition columns or adding metadat
    • Consider using transformation
  • Configuration Options:

    • Target Iceberg table
    • File format (initially Parquet
    • Partitioning strateg
    • Transformation rule
  • Error Handling & Retries:

    • Ensure robust failure recovery mechanism
    • Implement logging and monitoring for write
    • Consider implementing a dead letter queue for problematic message
  • Metrics Reporting:

    • Integrate metrics reporting to monitor ingestion performance

References & Inspiration

@laskoviymishka laskoviymishka changed the title Iceberg/sink: Basic iceberg sink Iceberg/sink: Implement Iceberg Sink with Snapshot and Append-Only Support Feb 4, 2025
@laskoviymishka laskoviymishka changed the title Iceberg/sink: Implement Iceberg Sink with Snapshot and Append-Only Support Iceberg/sink: Iceberg Sink with Snapshot and Append-Only Support Feb 4, 2025
@laskoviymishka laskoviymishka changed the title Iceberg/sink: Iceberg Sink with Snapshot and Append-Only Support Iceberg/sink: Iceberg Sink with Snapshot for Append-Only data Feb 4, 2025
@laskoviymishka
Copy link
Contributor Author

Blocked apache/iceberg-go#287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant