-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Icechunk CLI Design Document #714
base: main
Are you sure you want to change the base?
Changes from 7 commits
6c79183
85db4bf
36c81c3
2e06f58
1cc02d3
49260b7
8ae0e05
4b15a94
9debca1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,4 @@ | |
/devel | ||
|
||
.ipynb_checkpoints | ||
.vscode |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,143 @@ | ||
# Icechunk Command Line Interface | ||
|
||
This document outlines the design of the Icechunk command line interface. | ||
|
||
## Functionality | ||
|
||
Here is a list of tasks a user might want to do with Icechunk: | ||
|
||
- List my repositories | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What should this list? Probably all repositories in the repositories config? It would also be great to be able to point at a location and auto-discover repos. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Auto discover would be cool... seems like an advanced feature we may not need for a while. |
||
- List a history of a repo | ||
- List branches in a repo | ||
- List tags in a repo | ||
- Create a new repository | ||
- Check configuration | ||
- Diff between two commits | ||
- Invoke administrative tasks (garbage collection, compaction, etc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd add:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not yet reflected in the current |
||
|
||
## Interface | ||
|
||
General command structure | ||
|
||
```bash | ||
icechunk <object> <action> <args> | ||
``` | ||
|
||
Examples | ||
|
||
```bash | ||
icechunk repo list | ||
|
||
icechunk repo create <repo> | ||
icechunk repo info <repo> | ||
icechunk repo tree <repo> | ||
icechunk repo delete <repo> | ||
|
||
icechunk branch list <repo> | ||
icechunk branch create <repo> <branch_name> | ||
icechunk snapshot list <repo> | ||
icechunk snapshot diff <repo> <snapshot_id_1> <snapshot_id_2> | ||
icechunk ref list <repo> | ||
|
||
icechunk config init # init: interactive setup | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice! interactive could be very usefu |
||
icechunk config list | ||
icechunk config get <key> | ||
icechunk config set <key> <value> | ||
|
||
``` | ||
|
||
### Git-like interface | ||
|
||
Alternative would be a more git-like structure (`git diff`, `git show`, ..). | ||
|
||
The git interface is familiar, but | ||
|
||
- The differences between git and Icechunk can be deceptive to new users | ||
- The git interface is (arguably) not very user-friendly if you're not familiar with it | ||
- This structure is more extensible | ||
- Example: Docker [adopting](https://www.docker.com/blog/whats-new-in-docker-1-13/) this structure over time (`docker ps` -> `docker container ls`) | ||
|
||
## Configuration | ||
|
||
Two guiding use-cases | ||
|
||
- User just wants to `icechunk repo create s3://bucket/path`, get credentials from environment/aws config, and use default repo settings. | ||
- User wants to manage multiple repositories stored in different locations, with different credentials and settings. | ||
|
||
Following Icechunk's config module, there are four types of information needed to work with a repository: | ||
|
||
- Location: `bucket`, `path` | ||
- Credentials: `access_key_id`, `secret_access_key`, .. | ||
- Options: `region`, `endpoint_url`, .. | ||
- Repo configuration: `compression`, `caching`, `virtual_chunk_containers`, .. | ||
|
||
There are three ways to provide this information, in the standard order of precedence: | ||
|
||
1. Command line arguments | ||
2. Environment variables | ||
3. Configuration file | ||
|
||
|
||
### Repositories configuration | ||
|
||
The CLI repositories configuration file. | ||
|
||
> Note: This configuration could also be used by the library. | ||
|
||
A first draft of the structure: | ||
|
||
```rust | ||
use std::collections::HashMap; | ||
|
||
use crate::config::{RepositoryConfig, ObjectStoreConfig, Credentials} | ||
|
||
pub struct RepoLocation { | ||
bucket: String, | ||
prefix: String, | ||
} | ||
|
||
pub struct RepositoryDefinition { | ||
location: RepoLocation, | ||
object_store_config: ObjectStoreConfig, | ||
credentials: Credentials, | ||
config: RepositoryConfig, | ||
} | ||
|
||
pub struct RepositoryAlias(String); | ||
|
||
pub struct Repositories { | ||
repos: HashMap<RepositoryAlias, RepositoryDefinition>, | ||
} | ||
``` | ||
|
||
## Python packaging | ||
|
||
Following the [Python entrypoint](https://www.maturin.rs/bindings#both-binary-and-library) approach. | ||
|
||
- cli implemented in `icechunk/src/cli/` | ||
- cli exposed to Rust in `icechunk/src/bin/icechunk/` | ||
- cli exposed to Python through an entrypoint function, exposed in `pyproject.toml` | ||
|
||
```ini | ||
[project.scripts] | ||
icechunk = "icechunk._icechunk_python:cli_entrypoint" | ||
``` | ||
|
||
The disadvantage is that Python users need to call Python to use the CLI, resulting in hundreds of milliseconds of latency. | ||
|
||
The user can also install the Rust binary directly through `cargo install`. | ||
|
||
## Implementation details | ||
|
||
Implemented with | ||
|
||
- `clap` for the CLI | ||
- `clap_complete` for shell completion | ||
- `anyhow` for error handling | ||
- `serde_yaml_ng` for configuration | ||
|
||
## Optional features | ||
|
||
- Structured output option (e.g. JSON) | ||
- Short version of the command (e.g. `ic`) | ||
- Support for tab completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listing repositories is something Icechunk cannot do today. It can only verify if a repository exists at a given location.
We don't expect to add this functionality, we tend to see anything that is outside the repo prefix as "unknown" to Icechunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about listing the repositories defined in the proposed repo config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1