Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update readme for write support #552

Merged
merged 6 commits into from
Dec 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 14 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# delta-kernel-rs

Delta-kernel-rs is an experimental [Delta][delta] implementation focused on
interoperability with a wide range of query engines. It currently only supports
reads.
Delta-kernel-rs is an experimental [Delta][delta] implementation focused on interoperability with a
wide range of query engines. It currently supports reads and (experimental) writes. Only blind
appends are currently supported in the write path.

The Delta Kernel project is a Rust and C library for building Delta connectors that can read (and
soon, write) Delta tables without needing to understand the Delta [protocol
details][delta-protocol]. This is the Rust/C equivalent of [Java Delta Kernel][java-kernel].
The Delta Kernel project is a Rust and C library for building Delta connectors that can read and
write Delta tables without needing to understand the Delta [protocol details][delta-protocol]. This
is the Rust/C equivalent of [Java Delta Kernel][java-kernel].

## Crates

Expand All @@ -33,10 +33,12 @@ the acceptance tests against it.

In general, you will want to depend on `delta-kernel-rs` by adding it as a dependency to your
`Cargo.toml`, (that is, for rust projects using cargo) for other projects please see the [FFI]
module. The core kernel includes facilities for reading delta tables, but requires the consumer
to implement the `Engine` trait in order to use the table-reading APIs. If there is no need to
implement the consumer's own `Engine` trait, the kernel has a feature flag to enable a default,
asynchronous `Engine` implementation built with [Arrow] and [Tokio].
module. The core kernel includes facilities for reading and writing delta tables, and allows the
consumer to implement their own `Engine` trait in order to build engine-specific implementations of
the various `Engine` APIs that the kernel relies on (e.g. implement an engine-specific
`read_json_files()` using the native engine JSON reader). If there is no need to implement the
consumer's own `Engine` trait, the kernel has a feature flag to enable a default, asynchronous
`Engine` implementation built with [Arrow] and [Tokio].

```toml
# fewer dependencies, requires consumer to implement Engine trait.
Expand Down Expand Up @@ -126,12 +128,13 @@ projects.
There are a few key concepts that will help in understanding kernel:

1. The `Engine` trait encapsulates all the functionality and engine or connector needs to provide to
the Delta Kernel in order to read the Delta table.
the Delta Kernel in order to read/write the Delta table.
2. The `DefaultEngine` is our default implementation of the the above trait. It lives in
`engine/default`, and provides a reference implementation for all `Engine`
functionality. `DefaultEngine` uses [arrow](https://docs.rs/arrow/latest/arrow/) as its in-memory
data format.
3. A `Scan` is the entrypoint for reading data from a table.
4. A `Transaction` is the entrypoint for writing data to a table.

### Design Principles

Expand Down
48 changes: 28 additions & 20 deletions kernel/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,45 +1,53 @@
//! # Delta Kernel
//!
//! Delta-kernel-rs is an experimental [Delta](https://github.com/delta-io/delta/) implementation
//! focused on interoperability with a wide range of query engines. It currently only supports
//! reads. This library defines a number of traits which must be implemented to provide a
//! working "delta reader". They are detailed below. There is a provided "default engine" that
//! implements all these traits and can be used to ease integration work. See
//! [`DefaultEngine`](engine/default/index.html) for more information.
//! focused on interoperability with a wide range of query engines. It supports reads and
//! (experimental) writes (only blind appends in the write path currently). This library defines a
//! number of traits which must be implemented to provide a working delta implementation. They are
//! detailed below. There is a provided "default engine" that implements all these traits and can
//! be used to ease integration work. See [`DefaultEngine`](engine/default/index.html) for more
//! information.
//!
//! A full `rust` example for reading table data using the default engine can be found in the
//! [read-table-single-threaded] example (and for a more complex multi-threaded reader see the
//! [read-table-multi-threaded] example).
//!
//! [read-table-single-threaded]: https://github.com/delta-io/delta-kernel-rs/tree/main/kernel/examples/read-table-single-threaded
//! [read-table-multi-threaded]: https://github.com/delta-io/delta-kernel-rs/tree/main/kernel/examples/read-table-multi-threaded
//! [read-table-single-threaded]:
//! https://github.com/delta-io/delta-kernel-rs/tree/main/kernel/examples/read-table-single-threaded
//! [read-table-multi-threaded]:
//! https://github.com/delta-io/delta-kernel-rs/tree/main/kernel/examples/read-table-multi-threaded
//!
//! Simple write examples can be found in the [`write.rs`] integration tests. Standalone write
//! examples are coming soon!
//!
//! [`write.rs`]: https://github.com/delta-io/delta-kernel-rs/tree/main/kernel/tests/write.rs
//!
//! # Engine traits
//!
//! The [`Engine`] trait allow connectors to bring their own implementation of functionality such as
//! reading parquet files, listing files in a file system, parsing a JSON string etc. This trait
//! exposes methods to get sub-engines which expose the core functionalities customizable by
//! The [`Engine`] trait allow connectors to bring their own implementation of functionality such
//! as reading parquet files, listing files in a file system, parsing a JSON string etc. This
//! trait exposes methods to get sub-engines which expose the core functionalities customizable by
//! connectors.
//!
//! ## Expression handling
//!
//! Expression handling is done via the [`ExpressionHandler`], which in turn allows the creation
//! of [`ExpressionEvaluator`]s. These evaluators are created for a specific predicate [`Expression`]
//! Expression handling is done via the [`ExpressionHandler`], which in turn allows the creation of
//! [`ExpressionEvaluator`]s. These evaluators are created for a specific predicate [`Expression`]
//! and allow evaluation of that predicate for a specific batches of data.
//!
//! ## File system interactions
//!
//! Delta Kernel needs to perform some basic operations against file systems like listing and reading files.
//! These interactions are encapsulated in the [`FileSystemClient`] trait. Implementors must take
//! care that all assumptions on the behavior if the functions - like sorted results - are respected.
//! Delta Kernel needs to perform some basic operations against file systems like listing and
//! reading files. These interactions are encapsulated in the [`FileSystemClient`] trait.
//! Implementors must take care that all assumptions on the behavior if the functions - like sorted
//! results - are respected.
//!
//! ## Reading log and data files
//!
//! Delta Kernel requires the capability to read json and parquet files, which is exposed via the
//! [`JsonHandler`] and [`ParquetHandler`] respectively. When reading files, connectors are asked to
//! provide the context information it requires to execute the actual read. This is done by invoking
//! methods on the [`FileSystemClient`] trait.
//!
//! Delta Kernel requires the capability to read and write json files and read parquet files, which
//! is exposed via the [`JsonHandler`] and [`ParquetHandler`] respectively. When reading files,
//! connectors are asked to provide the context information it requires to execute the actual
//! operation. This is done by invoking methods on the [`FileSystemClient`] trait.

#![cfg_attr(all(doc, NIGHTLY_CHANNEL), feature(doc_auto_cfg))]
#![warn(
Expand Down
Loading