Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move fusio-object-store to another crate #32

Merged
merged 8 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 14 additions & 72 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ env:

jobs:
# 1
tokio_check:
name: Rust project check on tokio
check:
name: Rust project check
runs-on: ${{ matrix.os }}
strategy:
matrix:
Expand All @@ -29,99 +29,41 @@ jobs:

# `cargo check` command here will use installed `nightly`
# as it is set as an "override" for current directory

- name: Run cargo clippy on tokio
uses: actions-rs/cargo@v1
with:
command: check
args: --package fusio --features "tokio, futures"

- name: Run cargo build on tokio
uses: actions-rs/cargo@v1
with:
command: build
args: --package fusio --features "tokio, futures"

- name: Run cargo test on tokio
uses: actions-rs/cargo@v1
with:
command: test
args: --package fusio --features "tokio, futures"

monoio_check:
name: Rust project check on monoio
runs-on: ${{ matrix.os }}
strategy:
matrix:
os:
- ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install latest
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
components: rustfmt, clippy

# `cargo check` command here will use installed `nightly`
# as it is set as an "override" for current directory

- name: Run cargo clippy on monoio
uses: actions-rs/cargo@v1
with:
command: check
args: --package fusio --features "monoio, futures"
args: --package fusio --features=tokio,aws,tokio-http

- name: Run cargo build on monoio
uses: actions-rs/cargo@v1
with:
command: build
args: --package fusio --features "monoio, futures"
args: --package fusio --features=monoio

- name: Run cargo test on monoio
- name: Run cargo build on tokio-uring
uses: actions-rs/cargo@v1
with:
command: test
args: --package fusio --features "monoio, futures"

tokio_uring_check:
name: Rust project check on tokio_uring
runs-on: ${{ matrix.os }}
strategy:
matrix:
os:
- ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install latest
uses: actions-rs/toolchain@v1
with:
toolchain: stable
override: true
components: rustfmt, clippy

# `cargo check` command here will use installed `nightly`
# as it is set as an "override" for current directory
command: build
args: --package fusio --features=tokio-uring

- name: Run cargo clippy on tokio-uring
- name: Run cargo test on tokio
uses: actions-rs/cargo@v1
with:
command: check
args: --package fusio --features "tokio-uring, futures"
command: test
args: --package fusio --features=tokio,aws,tokio-http

- name: Run cargo build on tokio-uring
- name: Run cargo test on monoio
uses: actions-rs/cargo@v1
with:
command: build
args: --package fusio --features "tokio-uring, futures"
command: test
args: --package fusio --features=monoio

- name: Run cargo test on tokio-uring
uses: actions-rs/cargo@v1
with:
command: test
args: --package fusio --features "tokio-uring, futures"

args: --package fusio --features=tokio-uring
# 2
fmt:
name: Rust fmt
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[workspace]
members = ["fusio", "fusio-parquet"]
members = ["examples", "fusio", "fusio-object-store", "fusio-parquet"]
resolver = "2"

[workspace.package]
Expand Down
90 changes: 90 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Fusio

Fusio provides [Read](https://github.com/tonbo-io/fusio/blob/main/fusio/src/lib.rs#L81) and [Write](https://github.com/tonbo-io/fusio/blob/main/fusio/src/lib.rs#L63) traits to operate on multiple storage backends (e.g., local disk, Amazon S3) across various asynchronous runtimes—both poll-based ([tokio](https://github.com/tokio-rs/tokio)) and completion-based ([tokio-uring](https://github.com/tokio-rs/tokio-uring), [monoio](https://github.com/bytedance/monoio))—with:

- Lean: Binary size is at least 14× smaller than others.
- Minimal-cost abstraction: Compared to bare storage backends, trait definitions allow dispatching file operations without extra overhead.
- Extensible: Exposes traits to support implementing storage backends as third-party crates.

> **Fusio is now at preview version, please join our [community](https://discord.gg/j27XVFVmJM) to attend its development and semantic / behavior discussion.**

## Why do we need Fusio?

Since we started integrating object storage into [Tonbo](https://github.com/tonbo-io/tonbo), we realized the need for file and file system abstractions to dispatch read and write operations to multiple storage backends: memory, local disk, remote object storage, and so on. We found that existing solutions have the following limitations:
- Accessing local or remote file systems is not usable across various kinds of asynchronous runtimes (not only completion-based runtimes but also Python / JavaScript event loops).
- Most VFS implementations are designed for backend server scenarios. As an embedded database, Tonbo requires a lean implementation suitable for embedding, along with a set of traits that allow extending asynchronous file and file system approaches as third-party crates.

For more context, please check [apache/arrow-rs#6051](https://github.com/apache/arrow-rs/issues/6051).

## How to use it?

### Installation
```toml
fusio = { version = "*", features = ["tokio"] }
```

### Examples

#### [Runtime agnostic](https://github.com/tonbo-io/fusio/blob/main/examples/src/multi_runtime.rs)

`fusio` supports switching the async runtime at compile time. Middleware libraries can build runtime-agnostic implementations, allowing the top-level application to choose the runtime.

#### [Object safety](https://github.com/tonbo-io/fusio/blob/main/examples/src/object.rs)

`fusio` pprovides two sets of traits:
- `Read` / `Write` / `Seek` / `Fs` are not object-safe.
- `DynRead` / `DynWrite` / `DynSeek` / `DynFs` are object-safe.

You can freely transmute between them.

#### [File system traits](https://github.com/tonbo-io/fusio/blob/main/examples/src/fs.rs)

`fusio` has an optional Fs trait (use `default-features = false` to disable it). It dispatches common file system operations (open, remove, list, etc.) to specific storage backends (local disk, Amazon S3).

#### [S3 support](https://github.com/tonbo-io/fusio/blob/main/examples/src/s3.rs)

`fusio` has optional Amazon S3 support (enable it with `features = ["tokio-http", "aws"]`); the behavior of S3 operations and credentials does not depend on `tokio`.

## When to choose fusio?

Overall, `fusio` carefully selects a subset of semantics and behaviors from multiple storage backends and async runtimes to ensure native performance in most scenarios. For example, `fusio` adopts a completion-based API (inspired by [monoio](https://docs.rs/monoio/latest/monoio/io/trait.AsyncReadRent.html)) so that file operations on `tokio` and `tokio-uring` perform the same as they would without `fusio`.

### compare with `object_store`

`object_store` is locked to tokio and also depends on `bytes`. `fusio` uses `IoBuf` / `IoBufMut` to allow `&[u8]` and `Vec<u8>` to avoid potential runtime costs. If you do not need to consider other async runtimes, try `object_store`; as the official implementation, it integrates well with arrow and parquet.

### compare with `opendal`

`fusio` does not aim to be a full data access layer like `opendal`. `fusio` keeps features lean, and you are able to enable features and their dependencies one by one. The default binary size of `fusio` is 245KB, which is much smaller than `opendal` (8.9MB). If you need a full ecosystem of DAL (tracing, cache, etc.), try opendal.

Also, compared with `opendal::Operator`, fusio exposes core traits and allows them to be implemented in third-party crates.

## Roadmap
- abstractions
- [x] file operations
- [x] (partial) file system operations
- storage backend implementations
- disk
- [x] tokio
- [x] tokio-uring
- [x] monoio
- [x] network
- [x] HTTP client trait wi
- [x] network storage runtime support
- [x] tokio (over reqwest)
- [ ] monoio (over hyper-tls)
- [ ] tokio-uring (over hyper-tls)
- [x] Amazon S3
- [ ] Azure Blob Storage
- [ ] Cloudflare R2
- [ ] in-memory
- [ ] [conditional operations](https://aws.amazon.com/cn/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/)
- extensions
- [x] parquet support
- [x] object_store support

## Credits
- `monoio`: all core traits—buffer, read, and write—are highly inspired by it
- `futures`: its design of abstractions and organization of several crates (core, util, etc.) to avoid coupling have influenced `fusio`'s design
- `opendal`: Compile-time poll-based/completion-based runtime switching inspires `fusio`
- `object_store`: `fusio` adopts S3 credential and path behaviors from it
16 changes: 16 additions & 0 deletions examples/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[package]
edition.workspace = true
license.workspace = true
name = "examples"
repository.workspace = true
version = "0.1.0"

[features]
default = ["fusio/aws", "tokio"]
monoio = ["dep:monoio", "fusio/monoio"]
tokio = ["dep:tokio", "fusio/tokio"]

[dependencies]
fusio = { path = "../fusio" }
monoio = { version = "0.2", optional = true }
tokio = { version = "1.0", features = ["full"], optional = true }
19 changes: 19 additions & 0 deletions examples/src/fs.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
use std::sync::Arc;

use fusio::{disk::LocalFs, dynamic::DynFile, DynFs};

#[allow(unused)]
async fn use_fs() {
let fs: Arc<dyn DynFs> = Arc::new(LocalFs {});

let mut file: Box<dyn DynFile> = Box::new(fs.open(&"foo.txt".into()).await.unwrap());

let write_buf = "hello, world".as_bytes();
let mut read_buf = [0; 12];

let (result, _, read_buf) =
crate::write_without_runtime_awareness(&mut file, write_buf, &mut read_buf[..]).await;
result.unwrap();

assert_eq!(&read_buf, b"hello, world");
}
33 changes: 33 additions & 0 deletions examples/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
mod fs;
mod multi_runtime;
mod object;
mod s3;

use fusio::{Error, IoBuf, IoBufMut, Read, Seek, Write};

#[allow(unused)]
async fn write_without_runtime_awareness<F, B, BM>(
file: &mut F,
write_buf: B,
read_buf: BM,
) -> (Result<(), Error>, B, BM)
where
F: Read + Write + Seek,
B: IoBuf,
BM: IoBufMut,
{
let (result, write_buf) = file.write_all(write_buf).await;
if result.is_err() {
return (result, write_buf, read_buf);
}

file.sync_all().await.unwrap();
file.seek(0).await.unwrap();

let (result, read_buf) = file.read(read_buf).await;
if result.is_err() {
return (result.map(|_| ()), write_buf, read_buf);
}

(Ok(()), write_buf, read_buf)
}
31 changes: 31 additions & 0 deletions examples/src/multi_runtime.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
use crate::write_without_runtime_awareness;

#[allow(unused)]
#[cfg(feature = "tokio")]
async fn use_tokio_file() {
use tokio::fs::File;

let mut file = File::open("foo.txt").await.unwrap();
let write_buf = "hello, world".as_bytes();
let mut read_buf = [0; 12];
let (result, _, read_buf) =
write_without_runtime_awareness(&mut file, write_buf, &mut read_buf[..]).await;
result.unwrap();
assert_eq!(&read_buf, b"hello, world");
}

#[allow(unused)]
#[cfg(feature = "monoio")]
async fn use_monoio_file() {
use fusio::disk::MonoioFile;
use monoio::fs::File;

let mut file: MonoioFile = File::open("foo.txt").await.unwrap().into();
let write_buf = "hello, world".as_bytes();
let read_buf = vec![0; 12];
// completion-based runtime has to pass owned buffer to the function
let (result, _, read_buf) =
write_without_runtime_awareness(&mut file, write_buf, read_buf).await;
result.unwrap();
assert_eq!(&read_buf, b"hello, world");
}
38 changes: 38 additions & 0 deletions examples/src/object.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
use fusio::{dynamic::DynFile, Error, IoBuf, IoBufMut, Read, Write};

#[allow(unused)]
#[cfg(feature = "tokio")]
async fn use_tokio_file() {
use tokio::fs::File;

let mut file: Box<dyn DynFile> = Box::new(File::open("foo.txt").await.unwrap());
let write_buf = "hello, world".as_bytes();
let mut read_buf = [0; 12];
let (result, _, read_buf) =
object_safe_file_trait(&mut file, write_buf, &mut read_buf[..]).await;
result.unwrap();
assert_eq!(&read_buf, b"hello, world");
}

#[allow(unused)]
async fn object_safe_file_trait<B, BM>(
mut file: &mut Box<dyn DynFile>,
write_buf: B,
read_buf: BM,
) -> (Result<(), Error>, B, BM)
where
B: IoBuf,
BM: IoBufMut,
{
let (result, write_buf) = file.write_all(write_buf).await;
if result.is_err() {
return (result, write_buf, read_buf);
}

let (result, read_buf) = file.read(read_buf).await;
if result.is_err() {
return (result.map(|_| ()), write_buf, read_buf);
}

(Ok(()), write_buf, read_buf)
}
33 changes: 33 additions & 0 deletions examples/src/s3.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
use std::{env, sync::Arc};

use fusio::{
remotes::aws::{fs::AmazonS3Builder, AwsCredential},
DynFs,
};

use crate::write_without_runtime_awareness;

#[allow(unused)]
async fn use_fs() {
let key_id = env::var("AWS_ACCESS_KEY_ID").unwrap();
let secret_key = env::var("AWS_SECRET_ACCESS_KEY").unwrap();

let s3: Arc<dyn DynFs> = Arc::new(
AmazonS3Builder::new("fusio-test".into())
.credential(AwsCredential {
key_id,
secret_key,
token: None,
})
.region("ap-southeast-1".into())
.sign_payload(true)
.build(),
);

let _ = write_without_runtime_awareness(
&mut s3.open(&"foo.txt".into()).await.unwrap(),
"hello, world".as_bytes(),
&mut [0; 12][..],
)
.await;
}
Loading
Loading