DM-45873: Initial query_all_datasets implementation #1109

dhirving · 2024-10-29T23:40:26Z

Checklist

Extracted the query logic from the query-datasets CLI and used it to provide an initial implementation of query_all_datasets.

This gets the interface and unit tests in place for a future PR that will run the whole query on the server side for RemoteButler.

ran Jenkins
added a release note for user-visible changes to doc/changes
(if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

codecov · 2024-10-30T00:07:22Z

Codecov Report

Attention: Patch coverage is 98.70968% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.41%. Comparing base (e75d1d5) to head (c077b29).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
python/lsst/daf/butler/script/queryDatasets.py	92.30%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1109      +/-   ##
==========================================
+ Coverage   89.37%   89.41%   +0.03%     
==========================================
  Files         362      363       +1     
  Lines       48312    48423     +111     
  Branches     5872     5879       +7     
==========================================
+ Hits        43179    43296     +117     
+ Misses       3718     3717       -1     
+ Partials     1415     1410       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

timj

Thanks. I've tested it and it does seem to work as expected. It looks like the restriction of order-by only working for a single dataset type is not part of this PR yet.

timj · 2024-11-07T21:47:57Z

python/lsst/daf/butler/script/queryDatasets.py

                find_first=self._find_first,
+                name=self._dataset_type_glob,
                with_dimension_records=True,


Following on from a Slack comment, it looks like there are other scripts using the QueryDatasets class that do need dimensions so if you want to be able to turn this off for query-datasets but turn it on for transfer-datasets then you will need to add a parameter higher up.

Thanks -- I failed to notice that QueryDatasets was shared by multiple scripts. I had removed order-by and the dimension records in the follow-up PR to this one but I'll have to put the dimension records back.

timj · 2024-11-07T22:17:04Z

tests/test_cliCmdQueryDatasets.py

+        # Same as previous test, but with positive limit so no warning is
+        # issued.
+        tables = self._queryDatasets(
+            repo=testRepo.butler, limit=1, order_by=("visit"), collections="*", glob="*"


Maybe test that no warning was issued (by issuing your own warning and then making sure you only got the one warning)?

Pull out the "query all datasets" logic from the query-datasets CLI command to a separate function. In an upcoming commit this will be used to implement `Butler.query_all_datasets`.

Add a method for querying multiple dataset types simultaneously, currently hidden as `Butler._query_all_datasets`. This implementation uses the existing logic from the query-datasets CLI for doing the search.

The dataset type is supposed to default to '*' if no dataset types are provided by the user.

Fix an issue where the limit was being issued repeatedly because the caller was modifying the results array before query_all_datasets checked its length.

Add a description of which columns are invalid and which are allowed when an order by expression is not legal.

dhirving force-pushed the tickets/DM-45873 branch from 35f0fc7 to b181aa3 Compare October 29, 2024 23:54

dhirving force-pushed the tickets/DM-45873 branch 5 times, most recently from e2f48d5 to 16374f1 Compare November 4, 2024 19:03

dhirving marked this pull request as ready for review November 4, 2024 21:20

timj approved these changes Nov 7, 2024

View reviewed changes

dhirving added 7 commits November 8, 2024 14:23

Extract dataset query logic from CLI

e8fe878

Pull out the "query all datasets" logic from the query-datasets CLI command to a separate function. In an upcoming commit this will be used to implement `Butler.query_all_datasets`.

Pull out collection filtering logic to a helper function

a6b87e1

Add initial implementation of query_all_datasets

21ac180

Add a method for querying multiple dataset types simultaneously, currently hidden as `Butler._query_all_datasets`. This implementation uses the existing logic from the query-datasets CLI for doing the search.

Fix missing default for query-datasets type

7fb5b69

The dataset type is supposed to default to '*' if no dataset types are provided by the user.

Fix repeated limit warnings

27a70a1

Fix an issue where the limit was being issued repeatedly because the caller was modifying the results array before query_all_datasets checked its length.

Add test for no warnings issued with positive limit

14ba778

Make order-by error more informative

c077b29

Add a description of which columns are invalid and which are allowed when an order by expression is not legal.

dhirving force-pushed the tickets/DM-45873 branch from 2cd8e3e to c077b29 Compare November 8, 2024 21:23

dhirving merged commit 1de58f9 into main Nov 8, 2024
19 checks passed

dhirving deleted the tickets/DM-45873 branch November 8, 2024 23:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-45873: Initial query_all_datasets implementation #1109

DM-45873: Initial query_all_datasets implementation #1109

dhirving commented Oct 29, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading

timj left a comment

timj Nov 7, 2024

dhirving Nov 8, 2024

timj Nov 7, 2024

DM-45873: Initial query_all_datasets implementation #1109

DM-45873: Initial query_all_datasets implementation #1109

Conversation

dhirving commented Oct 29, 2024 • edited Loading

Checklist

codecov bot commented Oct 30, 2024 • edited Loading

Codecov Report

timj left a comment

Choose a reason for hiding this comment

timj Nov 7, 2024

Choose a reason for hiding this comment

dhirving Nov 8, 2024

Choose a reason for hiding this comment

timj Nov 7, 2024

Choose a reason for hiding this comment

dhirving commented Oct 29, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading