-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-45873: Initial query_all_datasets implementation #1109
Conversation
35f0fc7
to
b181aa3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1109 +/- ##
==========================================
+ Coverage 89.37% 89.41% +0.03%
==========================================
Files 362 363 +1
Lines 48312 48423 +111
Branches 5872 5879 +7
==========================================
+ Hits 43179 43296 +117
+ Misses 3718 3717 -1
+ Partials 1415 1410 -5 ☔ View full report in Codecov by Sentry. |
e2f48d5
to
16374f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've tested it and it does seem to work as expected. It looks like the restriction of order-by only working for a single dataset type is not part of this PR yet.
find_first=self._find_first, | ||
name=self._dataset_type_glob, | ||
with_dimension_records=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following on from a Slack comment, it looks like there are other scripts using the QueryDatasets
class that do need dimensions so if you want to be able to turn this off for query-datasets
but turn it on for transfer-datasets
then you will need to add a parameter higher up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks -- I failed to notice that QueryDatasets
was shared by multiple scripts. I had removed order-by and the dimension records in the follow-up PR to this one but I'll have to put the dimension records back.
tests/test_cliCmdQueryDatasets.py
Outdated
# Same as previous test, but with positive limit so no warning is | ||
# issued. | ||
tables = self._queryDatasets( | ||
repo=testRepo.butler, limit=1, order_by=("visit"), collections="*", glob="*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test that no warning was issued (by issuing your own warning and then making sure you only got the one warning)?
Pull out the "query all datasets" logic from the query-datasets CLI command to a separate function. In an upcoming commit this will be used to implement `Butler.query_all_datasets`.
Add a method for querying multiple dataset types simultaneously, currently hidden as `Butler._query_all_datasets`. This implementation uses the existing logic from the query-datasets CLI for doing the search.
The dataset type is supposed to default to '*' if no dataset types are provided by the user.
Fix an issue where the limit was being issued repeatedly because the caller was modifying the results array before query_all_datasets checked its length.
Add a description of which columns are invalid and which are allowed when an order by expression is not legal.
2cd8e3e
to
c077b29
Compare
Checklist
Extracted the query logic from the
query-datasets
CLI and used it to provide an initial implementation of query_all_datasets.This gets the interface and unit tests in place for a future PR that will run the whole query on the server side for RemoteButler.
doc/changes
configs/old_dimensions