Skip to content

Commit

Permalink
Merge pull request #903 from lsst/tickets/DM-34340
Browse files Browse the repository at this point in the history
DM-34340: implement RFC-834 deprecations
  • Loading branch information
TallJimbo authored Nov 22, 2023
2 parents f22234c + 4d005f7 commit 8ca025c
Show file tree
Hide file tree
Showing 75 changed files with 2,837 additions and 1,597 deletions.
19 changes: 19 additions & 0 deletions doc/changes/DM-34340.api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Deprecate most public APIs that use Dimension or DimensionElement objects.

This implements [RFC-834](https://jira.lsstcorp.org/browse/RFC-834), deprecating the `DimensionGraph` class (in favor of the new, similar `DimensionGroup`) and a large number of `DataCoordinate` methods and attributes, including its `collections.abc.Mapping` interface.

This includes:

- use `DataCoordinate.dimensions` instead of `DataCoordinate.graph` (likewise for arguments to `DataCoordinate.standardize`);
- use `dict(DataCoordinate.required)` as a drop-in replacement for `DataCoordinate.byName()`, but consider whether you want `DataCoordinate.required` (a `Mapping` view rather than a `dict`) or `DataCoordinate.mapping` (a `Mapping` with all *available* key-value pairs, not just the required ones);
- also use `DataCoordinate.mapping` or `DataCoordinate.required` instead of treating `DataCoordinate` itself as a `Mapping`, *except* square-bracket indexing, which is still very much supported;
- use `DataCoordinate.dimensions.required.names` or `DataCoordinate.required.keys()` as a drop-in replacement for `DataCoordinate.keys().names` or `DataCoordinate.names`, but consider whether you actually want `DataCoordinate.dimensions.names` or `DataCoordinate.mapping.keys` instead.

`DimensionGroup` is almost identical to `DimensionGraph`, but it and its subset attributes are not directly iterable (since those iterate over `Dimension` and `DimensionElement` objects); use the `.names` attribute to iterate over names instead (just as names could be iterated over in `DimensionGraph`).

`DimensionGraph` is still used in some `lsst.daf.butler` APIs (most prominently `DatasetType.dimensions`) that may be accessed without deprecation warnings being emitted, but iterating over that object or its subset attributes *will* yield deprecation warnings.
And `DimensionGraph` is still accepted along with `DimensionGroup` without warning in most public APIs.
When `DimensionGraph` is removed, methods and properties that return `DimensionGraph` will start returning `DimensionGroup` instead.

Rare code (mostly in downstream middleware packages) that does need access to `Dimension` or `DimensionElement` objects should obtain them directly from the `DimensionUniverse`.
For the pattern of checking whether a dimension is a skypix level, test whether its name is in `DimensionUniverse.skypix_dimensions` or `DimensionGroup.skypix` instead of obtaining a `Dimension` instance and calling `isinstance(dimension, SkyPixDimension)`.
18 changes: 7 additions & 11 deletions doc/lsst.daf.butler/dimensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,11 @@ There are two kinds of relationships:
For example, the visit dimension has an implied dependency on the physical filter dimension, because a visit is observed through exactly one filter and hence each visit ID determines a filter name.
When both dimensions are associated with database tables, an implied dependency involves having a foreign key field in the dependent table that is not part of a primary key in the dependent table.

A `DimensionGraph` is an immutable, set-like container of dimensions that is guaranteed to (recursively) include all dependencies of any dimension in the graph.
It also categorizes those dimensions into `~DimensionGraph.required` and `~DimensionGraph.implied` subsets, which have roughly the same meaning for a set of dimensions as they do for a single dimension: once the primary key values of all of the required dimensions are known, the primary key values of all implied dimensions are known as well.
`DimensionGraph` also guarantees a deterministic and topological sort order for its elements.
A `DimensionGroup` is an immutable, set-like container of dimensions that is guaranteed to (recursively) include all dependencies of any dimension in the set.
It also categorizes those dimensions into `~DimensionGroup.required` and `~DimensionGroup.implied` subsets, which have roughly the same meaning for a set of dimensions as they do for a single dimension: once the primary key values of all of the required dimensions are known, the primary key values of all implied dimensions are known as well.
`DimensionGroup` also guarantees a deterministic and topological sort order for its elements.

Because `Dimension` instances have a `~Dimension.name` attribute, we typically
use `~lsst.daf.butler.NamedValueSet` and `~lsst.daf.butler.NamedKeyDict` as containers when immutability is needed or the guarantees of `DimensionGraph`.
This allows the string names of dimensions to be used as well in most places where `Dimension` instances are expected.

The complete set of all compatible dimensions is held by a special subclass of `DimensionGraph`, `DimensionUniverse`.
The complete set of all recognized dimensions is managed by a `DimensionUniverse`.
A dimension universe is constructed from configuration, and is responsible for constructing all `Dimension` and `DimensionElement` instances; within a universe, there is exactly one `Dimension` instance that is always used to represent a particular dimension.

`DimensionUniverse` instances themselves are held in a global map keyed by the version number in the configuration used for construction, so they behave somewhat like singletons.
Expand All @@ -50,15 +46,15 @@ Data IDs
--------

The most common way butler users encounter dimensions is as the keys in a *data ID*, a dictionary that maps dimensions to their primary key values.
Different datasets with the same `DatasetType` are always identified by the same set of dimensions (i.e. the same set of data ID keys), and hence a `DatasetType` instance holds a `DimensionGraph` that contains exactly those keys.
Different datasets with the same `DatasetType` are always identified by the same set of dimensions (i.e. the same set of data ID keys), and hence a `DatasetType` instance holds a `DimensionGroup` that contains exactly those keys.

Many data IDs are simply Python dictionaries that use the string names of dimensions or actual `Dimension` instances as keys.
Most `Butler` and `Registry` APIs that accept data IDs as input accept both dictionaries and keyword arguments that are added to these dictionaries automatically.

The data IDs returned by the `Butler` or `Registry` (and most of those used internally) are usually instances of the `DataCoordinate` class.
`DataCoordinate` instances can have different states of knowledge about the dimensions they identify.
They always contain at least the key-value pairs that correspond to its `DimensionGraph`\ 's `~DimensionGraph.required` subset -- that is, the minimal set of keys needed to fully identify all other dimensions in the graph.
They can also contain key-value pairs for the `~DimensionGraph.implied` subset (a state indicated by `DataCoordinate.hasFull()` returning `True`).
They always contain at least the key-value pairs that correspond to its `DimensionGroup`\ 's `~DimensionGroup.required` subset -- that is, the minimal set of keys needed to fully identify all other dimensions in the set.
They can also contain key-value pairs for the `~DimensionGroup.implied` subset (a state indicated by `DataCoordinate.hasFull()` returning `True`).
And if `DataCoordinate.hasRecords` returns `True`, the data ID also holds all of the metadata records associated with its dimensions, both as a mapping in
the `~DataCoordinate.records` attribute and via dynamic attribute access, e.g.
``data_id.exposure.day_obs``.
Expand Down
4 changes: 1 addition & 3 deletions python/lsst/daf/butler/_column_categorization.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,7 @@ def from_iterable(cls, iterable: Iterable[Any]) -> ColumnCategorization:

def filter_skypix(self, universe: DimensionUniverse) -> Iterator[SkyPixDimension]:
return (
dimension
for name in self.dimension_keys
if isinstance(dimension := universe[name], SkyPixDimension)
dimension for name in self.dimension_keys if (dimension := universe.skypix_dimensions.get(name))
)

def filter_governors(self, universe: DimensionUniverse) -> Iterator[GovernorDimension]:
Expand Down
32 changes: 12 additions & 20 deletions python/lsst/daf/butler/_config_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@
from collections.abc import Iterable, Mapping
from typing import TYPE_CHECKING, Any

from .dimensions import DimensionGraph
from .dimensions import DimensionGroup

if TYPE_CHECKING:
from ._config import Config
from .dimensions import Dimension, DimensionUniverse
from .dimensions import DimensionUniverse

log = logging.getLogger(__name__)

Expand All @@ -58,23 +58,23 @@ class LookupKey:
name : `str`, optional
Primary index string for lookup. If this string looks like it
represents dimensions (via ``dim1+dim2+dim3`` syntax) the name
is converted to a `DimensionGraph` and stored in ``dimensions``
is converted to a `DimensionGroup` and stored in ``dimensions``
property.
dimensions : `DimensionGraph`, optional
dimensions : `DimensionGroup`, optional
Dimensions that are relevant for lookup. Should not be specified
if ``name`` is also specified.
dataId : `dict`, optional
Keys and values from a dataId that should control lookups.
universe : `DimensionUniverse`, optional
Set of all known dimensions, used to expand and validate ``name`` or
``dimensions``. Required if the key represents dimensions and a
full `DimensionGraph` is not provided.
full `DimensionGroup` is not provided.
"""

def __init__(
self,
name: str | None = None,
dimensions: Iterable[str | Dimension] | None = None,
dimensions: DimensionGroup | None = None,
dataId: dict[str, Any] | None = None,
*,
universe: DimensionUniverse | None = None,
Expand All @@ -100,7 +100,7 @@ def __init__(
# indicate this but have to filter out the empty value
dimension_names = [n for n in name.split("+") if n]
try:
self._dimensions = universe.extract(dimension_names)
self._dimensions = universe.conform(dimension_names)
except KeyError:
# One or more of the dimensions is not known to the
# universe. This could be a typo or it could be that
Expand All @@ -122,15 +122,7 @@ def __init__(
self._name = name

elif dimensions is not None:
if not isinstance(dimensions, DimensionGraph):
if universe is None:
raise ValueError(
f"Cannot construct LookupKey for dimensions={dimensions} without universe."
)
else:
self._dimensions = universe.extract(dimensions)
else:
self._dimensions = dimensions
self._dimensions = dimensions
else:
# mypy cannot work this out on its own
raise ValueError("Name was None but dimensions is also None")
Expand Down Expand Up @@ -181,8 +173,8 @@ def name(self) -> str | None:
return self._name

@property
def dimensions(self) -> DimensionGraph | None:
"""Dimensions associated with lookup (`DimensionGraph`)."""
def dimensions(self) -> DimensionGroup | None:
"""Dimensions associated with lookup (`DimensionGroup`)."""
return self._dimensions

@property
Expand All @@ -203,7 +195,7 @@ def __hash__(self) -> int:
def clone(
self,
name: str | None = None,
dimensions: DimensionGraph | None = None,
dimensions: DimensionGroup | None = None,
dataId: dict[str, Any] | None = None,
) -> LookupKey:
"""Clone the object, overriding some options.
Expand All @@ -216,7 +208,7 @@ def clone(
name : `str`, optional
Primary index string for lookup. Will override ``dimensions``
if ``dimensions`` are set.
dimensions : `DimensionGraph`, optional
dimensions : `DimensionGroup`, optional
Dimensions that are relevant for lookup. Will override ``name``
if ``name`` is already set.
dataId : `dict`, optional
Expand Down
4 changes: 2 additions & 2 deletions python/lsst/daf/butler/_dataset_ref.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def makeDatasetId(
else:
raise ValueError(f"Unexpected ID generation mode: {idGenerationMode}")

for name, value in sorted(dataId.byName().items()):
for name, value in sorted(dataId.required.items()):
items.append((name, str(value)))
data = ",".join(f"{key}={value}" for key, value in items)
return uuid.uuid5(self.NS_UUID, data)
Expand Down Expand Up @@ -328,7 +328,7 @@ def __init__(
):
self.datasetType = datasetType
if conform:
self.dataId = DataCoordinate.standardize(dataId, graph=datasetType.dimensions)
self.dataId = DataCoordinate.standardize(dataId, dimensions=datasetType.dimensions)
else:
self.dataId = dataId
self.run = run
Expand Down
Loading

0 comments on commit 8ca025c

Please sign in to comment.