-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Icechunk could use a command line interface #461
Comments
This sounds like a fun project to work on while getting acquainted with |
Very happy if you want to take this @hendrikmakait ! We want it in Rust. No preference of CLI toolkit. I'd say start small, with one operation in a PR that we can look at, and then we keep adding. I think a challenge is going to be how to pass all the arguments needed to define a repo. What object store, location, credentials, etc. Should be fun! |
clap is the best cli toolkit for rust IMO, so that may be a good place to start! It can also start out in the |
+1 We'll need to see how to distribute it easily for people who install the python library, but that seems like an easy problem. |
@hendrikmakait I would love to join this effort. Have you been able to pick it up yet? If not, I could get us started with an initial PR in the next few days and we could divide the work from there. |
Amazing news @DahnJ ! Feel free to ping me in slack if you want pointers. I guess I would start with a short design document explaining what functionality we want to include, how users are going to pass arguments via command line, how it's going to be used from python, etc. One tricky thing, as I mention above, is that you need a bunch of information to "point" to an icechunk repo. You need the object store details (potentially including things like region, endpoint_url, etc), you need credentials, you need a prefix. It sounds like a lot to pass in the command line. Maybe we should thing if it's worth having a config file for the CLI, where you can alias all this set of parameters with a single name. So I can say For reference, this is the list of arguments needed to create an instance of S3 Storage: def s3_storage(
*,
bucket: str,
prefix: str | None,
region: str | None = None,
endpoint_url: str | None = None,
allow_http: bool = False,
access_key_id: str | None = None,
secret_access_key: str | None = None,
session_token: str | None = None,
expires_after: datetime | None = None,
anonymous: bool | None = None,
from_env: bool | None = None,
get_credentials: Callable[[], S3StaticCredentials] | None = None,
) -> Storage: And that doesn't include the repository config. |
Thanks for the pointers @paraseba. I'm not feeling ready for a full design document yet, but I'll try to work iteratively and report here so as not to block this completely. Hopefully this writeup helps make headway. Packaging with PythonSo far I tried to look into the question of distributing the CLI with Python. I see two ways. Separate binaryHere we write the CLI code in (for example) I didn't actually get this to work. I attempted to use the # Cargo.toml
[[bin]]
name = "icechunk"
path = "src/bin/icechunk/main.rs" # pyproject.toml
[tool.maturin]
bindings = "bin" This only adds the CLI binary into wheel and not Advantages/disadvantagesAssuming we could get this to work:
Alternative: two wheelsAs mentioned in PyO3/maturin#368 (comment), this could be "solved" by simply having two wheels that the users would install separately. Python entrypointMaturin docs suggest wrapping the CLI in a Python entrypoint. To make this work, I
[project.scripts]
icechunk = "icechunk._icechunk_python:cli_entrypoint" This works for me, but has the major disadvantage that Python users would need to call Python to run the CLI. On my machine this is around 250ms of delay when calling the CLI. |
@DahnJ I feel the "Python entrypoint" approach is simple and good enough. The CLI is not meant for "low latency" operations anyway. And, we can always:
Thank you for the doing the analysis and posting the details! |
@DahnJ, I've unfortunately gotten distracted by some more pressing stuff before being able to put together something meaningful. Great to see your work on this! |
+1, this was the first thing that I stumbled over. Having a config akin to |
One more feature idea:
|
Examples:
The text was updated successfully, but these errors were encountered: