Merge pull request #903 from lsst/tickets/DM-34340

DM-34340: implement RFC-834 deprecations
lsst · Nov 22, 2023 · 8ca025c · 8ca025c
2 parents f22234c + 4d005f7
commit 8ca025c
Show file tree

Hide file tree

Showing 75 changed files with 2,837 additions and 1,597 deletions.
diff --git a/doc/changes/DM-34340.api.md b/doc/changes/DM-34340.api.md
@@ -0,0 +1,19 @@
+Deprecate most public APIs that use Dimension or DimensionElement objects.
+
+This implements [RFC-834](https://jira.lsstcorp.org/browse/RFC-834), deprecating the `DimensionGraph` class (in favor of the new, similar `DimensionGroup`) and a large number of `DataCoordinate` methods and attributes, including its `collections.abc.Mapping` interface.
+
+This includes:
+
+- use `DataCoordinate.dimensions` instead of `DataCoordinate.graph` (likewise for arguments to `DataCoordinate.standardize`);
+- use `dict(DataCoordinate.required)` as a drop-in replacement for `DataCoordinate.byName()`, but consider whether you want `DataCoordinate.required` (a `Mapping` view rather than a `dict`) or `DataCoordinate.mapping` (a `Mapping` with all *available* key-value pairs, not just the required ones);
+- also use `DataCoordinate.mapping` or `DataCoordinate.required` instead of treating `DataCoordinate` itself as a `Mapping`, *except* square-bracket indexing, which is still very much supported;
+- use `DataCoordinate.dimensions.required.names` or `DataCoordinate.required.keys()` as a drop-in replacement for `DataCoordinate.keys().names` or `DataCoordinate.names`, but consider whether you actually want `DataCoordinate.dimensions.names` or `DataCoordinate.mapping.keys` instead.
+
+`DimensionGroup` is almost identical to `DimensionGraph`, but it and its subset attributes are not directly iterable (since those iterate over `Dimension` and `DimensionElement` objects); use the `.names` attribute to iterate over names instead (just as names could be iterated over in `DimensionGraph`).
+
+`DimensionGraph` is still used in some `lsst.daf.butler` APIs (most prominently `DatasetType.dimensions`) that may be accessed without deprecation warnings being emitted, but iterating over that object or its subset attributes *will* yield deprecation warnings.
+And `DimensionGraph` is still accepted along with `DimensionGroup` without warning in most public APIs.
+When `DimensionGraph` is removed, methods and properties that return `DimensionGraph` will start returning `DimensionGroup` instead.
+
+Rare code (mostly in downstream middleware packages) that does need access to `Dimension` or `DimensionElement` objects should obtain them directly from the `DimensionUniverse`.
+For the pattern of checking whether a dimension is a skypix level, test whether its name is in `DimensionUniverse.skypix_dimensions` or `DimensionGroup.skypix` instead of obtaining a `Dimension` instance and calling `isinstance(dimension, SkyPixDimension)`.
diff --git a/doc/lsst.daf.butler/dimensions.rst b/doc/lsst.daf.butler/dimensions.rst
@@ -31,15 +31,11 @@ There are two kinds of relationships:
    For example, the visit dimension has an implied dependency on the physical filter dimension, because a visit is observed through exactly one filter and hence each visit ID determines a filter name.
    When both dimensions are associated with database tables, an implied dependency involves having a foreign key field in the dependent table that is not part of a primary key in the dependent table.
 
-A `DimensionGraph` is an immutable, set-like container of dimensions that is guaranteed to (recursively) include all dependencies of any dimension in the graph.
-It also categorizes those dimensions into `~DimensionGraph.required` and `~DimensionGraph.implied` subsets, which have roughly the same meaning for a set of dimensions as they do for a single dimension: once the primary key values of all of the required dimensions are known, the primary key values of all implied dimensions are known as well.
-`DimensionGraph` also guarantees a deterministic and topological sort order for its elements.
+A `DimensionGroup` is an immutable, set-like container of dimensions that is guaranteed to (recursively) include all dependencies of any dimension in the set.
+It also categorizes those dimensions into `~DimensionGroup.required` and `~DimensionGroup.implied` subsets, which have roughly the same meaning for a set of dimensions as they do for a single dimension: once the primary key values of all of the required dimensions are known, the primary key values of all implied dimensions are known as well.
+`DimensionGroup` also guarantees a deterministic and topological sort order for its elements.
 
-Because `Dimension` instances have a `~Dimension.name` attribute, we typically
-use `~lsst.daf.butler.NamedValueSet` and `~lsst.daf.butler.NamedKeyDict` as containers when immutability is needed or the guarantees of `DimensionGraph`.
-This allows the string names of dimensions to be used as well in most places where `Dimension` instances are expected.
-
-The complete set of all compatible dimensions is held by a special subclass of `DimensionGraph`, `DimensionUniverse`.
+The complete set of all recognized dimensions is managed by a `DimensionUniverse`.
 A dimension universe is constructed from configuration, and is responsible for constructing all `Dimension` and `DimensionElement` instances; within a universe, there is exactly one `Dimension` instance that is always used to represent a particular dimension.
 
 `DimensionUniverse` instances themselves are held in a global map keyed by the version number in the configuration used for construction, so they behave somewhat like singletons.
@@ -50,15 +46,15 @@ Data IDs
 --------
 
 The most common way butler users encounter dimensions is as the keys in a *data ID*, a dictionary that maps dimensions to their primary key values.
-Different datasets with the same `DatasetType` are always identified by the same set of dimensions (i.e. the same set of data ID keys), and hence a `DatasetType` instance holds a `DimensionGraph` that contains exactly those keys.
+Different datasets with the same `DatasetType` are always identified by the same set of dimensions (i.e. the same set of data ID keys), and hence a `DatasetType` instance holds a `DimensionGroup` that contains exactly those keys.
 
 Many data IDs are simply Python dictionaries that use the string names of dimensions or actual `Dimension` instances as keys.
 Most `Butler` and `Registry` APIs that accept data IDs as input accept both dictionaries and keyword arguments that are added to these dictionaries automatically.
 
 The data IDs returned by the `Butler` or `Registry` (and most of those used internally) are usually instances of the `DataCoordinate` class.
 `DataCoordinate` instances can have different states of knowledge about the dimensions they identify.
-They always contain at least the key-value pairs that correspond to its `DimensionGraph`\ 's `~DimensionGraph.required` subset -- that is, the minimal set of keys needed to fully identify all other dimensions in the graph.
-They can also contain key-value pairs for the `~DimensionGraph.implied` subset (a state indicated by `DataCoordinate.hasFull()` returning `True`).
+They always contain at least the key-value pairs that correspond to its `DimensionGroup`\ 's `~DimensionGroup.required` subset -- that is, the minimal set of keys needed to fully identify all other dimensions in the set.
+They can also contain key-value pairs for the `~DimensionGroup.implied` subset (a state indicated by `DataCoordinate.hasFull()` returning `True`).
 And if `DataCoordinate.hasRecords` returns `True`, the data ID also holds all of the metadata records associated with its dimensions, both as a mapping in
 the `~DataCoordinate.records` attribute and via dynamic attribute access, e.g.
 ``data_id.exposure.day_obs``.

diff --git a/python/lsst/daf/butler/_column_categorization.py b/python/lsst/daf/butler/_column_categorization.py
@@ -63,9 +63,7 @@ def from_iterable(cls, iterable: Iterable[Any]) -> ColumnCategorization:
 
     def filter_skypix(self, universe: DimensionUniverse) -> Iterator[SkyPixDimension]:
         return (
-            dimension
-            for name in self.dimension_keys
-            if isinstance(dimension := universe[name], SkyPixDimension)
+            dimension for name in self.dimension_keys if (dimension := universe.skypix_dimensions.get(name))
         )
 
     def filter_governors(self, universe: DimensionUniverse) -> Iterator[GovernorDimension]:

diff --git a/python/lsst/daf/butler/_config_support.py b/python/lsst/daf/butler/_config_support.py
@@ -36,11 +36,11 @@
 from collections.abc import Iterable, Mapping
 from typing import TYPE_CHECKING, Any
 
-from .dimensions import DimensionGraph
+from .dimensions import DimensionGroup
 
 if TYPE_CHECKING:
     from ._config import Config
-    from .dimensions import Dimension, DimensionUniverse
+    from .dimensions import DimensionUniverse
 
 log = logging.getLogger(__name__)
 
@@ -58,23 +58,23 @@ class LookupKey:
     name : `str`, optional
         Primary index string for lookup.  If this string looks like it
         represents dimensions (via ``dim1+dim2+dim3`` syntax) the name
-        is converted to a `DimensionGraph` and stored in ``dimensions``
+        is converted to a `DimensionGroup` and stored in ``dimensions``
         property.
-    dimensions : `DimensionGraph`, optional
+    dimensions : `DimensionGroup`, optional
         Dimensions that are relevant for lookup. Should not be specified
         if ``name`` is also specified.
     dataId : `dict`, optional
         Keys and values from a dataId that should control lookups.
     universe : `DimensionUniverse`, optional
         Set of all known dimensions, used to expand and validate ``name`` or
         ``dimensions``.  Required if the key represents dimensions and a
-        full `DimensionGraph` is not provided.
+        full `DimensionGroup` is not provided.
     """
 
     def __init__(
         self,
         name: str | None = None,
-        dimensions: Iterable[str | Dimension] | None = None,
+        dimensions: DimensionGroup | None = None,
         dataId: dict[str, Any] | None = None,
         *,
         universe: DimensionUniverse | None = None,
@@ -100,7 +100,7 @@ def __init__(
                 # indicate this but have to filter out the empty value
                 dimension_names = [n for n in name.split("+") if n]
                 try:
-                    self._dimensions = universe.extract(dimension_names)
+                    self._dimensions = universe.conform(dimension_names)
                 except KeyError:
                     # One or more of the dimensions is not known to the
                     # universe. This could be a typo or it could be that
@@ -122,15 +122,7 @@ def __init__(
                 self._name = name
 
         elif dimensions is not None:
-            if not isinstance(dimensions, DimensionGraph):
-                if universe is None:
-                    raise ValueError(
-                        f"Cannot construct LookupKey for dimensions={dimensions} without universe."
-                    )
-                else:
-                    self._dimensions = universe.extract(dimensions)
-            else:
-                self._dimensions = dimensions
+            self._dimensions = dimensions
         else:
             # mypy cannot work this out on its own
             raise ValueError("Name was None but dimensions is also None")
@@ -181,8 +173,8 @@ def name(self) -> str | None:
         return self._name
 
     @property
-    def dimensions(self) -> DimensionGraph | None:
-        """Dimensions associated with lookup (`DimensionGraph`)."""
+    def dimensions(self) -> DimensionGroup | None:
+        """Dimensions associated with lookup (`DimensionGroup`)."""
         return self._dimensions
 
     @property
@@ -203,7 +195,7 @@ def __hash__(self) -> int:
     def clone(
         self,
         name: str | None = None,
-        dimensions: DimensionGraph | None = None,
+        dimensions: DimensionGroup | None = None,
         dataId: dict[str, Any] | None = None,
     ) -> LookupKey:
         """Clone the object, overriding some options.
@@ -216,7 +208,7 @@ def clone(
         name : `str`, optional
             Primary index string for lookup.  Will override ``dimensions``
             if ``dimensions`` are set.
-        dimensions : `DimensionGraph`, optional
+        dimensions : `DimensionGroup`, optional
             Dimensions that are relevant for lookup. Will override ``name``
             if ``name`` is already set.
         dataId : `dict`, optional

diff --git a/python/lsst/daf/butler/_dataset_ref.py b/python/lsst/daf/butler/_dataset_ref.py
@@ -173,7 +173,7 @@ def makeDatasetId(
             else:
                 raise ValueError(f"Unexpected ID generation mode: {idGenerationMode}")
 
-            for name, value in sorted(dataId.byName().items()):
+            for name, value in sorted(dataId.required.items()):
                 items.append((name, str(value)))
             data = ",".join(f"{key}={value}" for key, value in items)
             return uuid.uuid5(self.NS_UUID, data)
@@ -328,7 +328,7 @@ def __init__(
     ):
         self.datasetType = datasetType
         if conform:
-            self.dataId = DataCoordinate.standardize(dataId, graph=datasetType.dimensions)
+            self.dataId = DataCoordinate.standardize(dataId, dimensions=datasetType.dimensions)
         else:
             self.dataId = dataId
         self.run = run