Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: object store registry for custom object store providers #2513

Merged
merged 10 commits into from
Jul 30, 2024

Conversation

maxburke
Copy link
Contributor

This change lays the groundwork for LanceDB to support custom implementations of object stores, selected by a custom URL scheme.

Ref: https://discord.com/channels/1030247538198061086/1197630564254101618/1208915798425477181

@github-actions github-actions bot added the java label Jun 22, 2024
Copy link

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@maxburke maxburke changed the title Add an object store registry for custom object store providers feat: Add an object store registry for custom object store providers Jun 22, 2024
@github-actions github-actions bot added the enhancement New feature or request label Jun 22, 2024
@eddyxu eddyxu requested review from wjones127 and westonpace and removed request for wjones127 June 22, 2024 18:13
@maxburke maxburke force-pushed the custom-object-store branch from 19aea15 to afb3e82 Compare June 22, 2024 18:19
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making a PR. I'm in favor of making the registry extensible. I think a good property of an extensible registry is that the default values are able to be inserted using the extension point. That ensures that third-parties have all the APIs they need to properly extend Lance. I don't think you have to do that refactor immediately in this PR. Instead, I've made some suggestions about some tweaks we would likely need to make to be able to use the registry for our default stores.

rust/lance-io/src/object_store.rs Outdated Show resolved Hide resolved
rust/lance-io/src/object_store.rs Outdated Show resolved Hide resolved
@maxburke
Copy link
Contributor Author

Thanks for making a PR. I'm in favor of making the registry extensible. I think a good property of an extensible registry is that the default values are able to be inserted using the extension point. That ensures that third-parties have all the APIs they need to properly extend Lance. I don't think you have to do that refactor immediately in this PR. Instead, I've made some suggestions about some tweaks we would likely need to make to be able to use the registry for our default stores.

Thanks for your comments, @wjones127 !

One thought I had was that the new_store trait member should also be accompanied by a new_commit_handler trait member, which would help make the changes to lance-table neater. However, the CommitHandler trait is defined in lance-table, and the ObjectStoreRegistry is defined in lance-io, so something would have to move (probably CommitHandler to lance-io). Is this an OK move to make as part of this change?

@maxburke maxburke changed the title feat: Add an object store registry for custom object store providers feat: object store registry for custom object store providers Jun 25, 2024
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks close to ready to merge. I think we can remove the changes to optimize since the registry won't actually be used by those code paths. And it will simplify the optimize APIs.

rust/lance-io/src/object_store.rs Outdated Show resolved Hide resolved
rust/lance/src/dataset/optimize.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 56.04396% with 40 lines in your changes missing coverage. Please review.

Project coverage is 79.41%. Comparing base (9535186) to head (8264b10).

Files Patch % Lines
rust/lance-io/src/object_store.rs 65.62% 10 Missing and 1 partial ⚠️
java/core/lance-jni/src/blocking_dataset.rs 0.00% 9 Missing ⚠️
rust/lance/src/dataset.rs 55.00% 8 Missing and 1 partial ⚠️
rust/lance/src/dataset/builder.rs 53.33% 7 Missing ⚠️
rust/lance/src/dataset/write.rs 33.33% 2 Missing ⚠️
rust/lance-table/src/io/commit.rs 0.00% 1 Missing ⚠️
rust/lance/src/dataset/cleanup.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2513      +/-   ##
==========================================
- Coverage   79.44%   79.41%   -0.04%     
==========================================
  Files         221      221              
  Lines       64683    64744      +61     
  Branches    64683    64744      +61     
==========================================
+ Hits        51388    51415      +27     
- Misses      10335    10366      +31     
- Partials     2960     2963       +3     
Flag Coverage Δ
unittests 79.41% <56.04%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wjones127
Copy link
Contributor

However, the CommitHandler trait is defined in lance-table, and the ObjectStoreRegistry is defined in lance-io, so something would have to move (probably CommitHandler to lance-io).

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table.

There is a commit_handler_from_url function that maybe should be put on the registry.

@maxburke
Copy link
Contributor Author

maxburke commented Jul 24, 2024

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table.

There is a commit_handler_from_url function that maybe should be put on the registry.

Would you like me to do this as part of this MR or would you like me to follow up in another MR?

edit: personally I'd be more comfortable doing it in a follow-up MR because I think it would have a big knock-on how the default object stores are constructed; I think it would need moving ObjectStore::from_uri_and_params / ObjectStore::new_from_url out of lance_io and into lance

@wjones127
Copy link
Contributor

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table.
There is a commit_handler_from_url function that maybe should be put on the registry.

Would you like me to do this as part of this MR or would you like me to follow up in another MR?

edit: personally I'd be more comfortable doing it in a follow-up MR because I think it would have a big knock-on how the default object stores are constructed; I think it would need moving ObjectStore::from_uri_and_params / ObjectStore::new_from_url out of lance_io and into lance

You can do that in a follow up PR.

@maxburke
Copy link
Contributor Author

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table.
There is a commit_handler_from_url function that maybe should be put on the registry.

Would you like me to do this as part of this MR or would you like me to follow up in another MR?
edit: personally I'd be more comfortable doing it in a follow-up MR because I think it would have a big knock-on how the default object stores are constructed; I think it would need moving ObjectStore::from_uri_and_params / ObjectStore::new_from_url out of lance_io and into lance

You can do that in a follow up PR.

Awesome. Is there anything else that needs to be taken care of in this MR?

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@wjones127 wjones127 merged commit bf767be into lancedb:main Jul 30, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request java python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants