Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic ItemCollection implementation #430

Merged
merged 14 commits into from
Jun 14, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
- Links to Issues, Discussions, and documentation sites ([#409](https://github.com/stac-utils/pystac/pull/409))
- Python minimum version set to `>=3.6` ([#409](https://github.com/stac-utils/pystac/pull/409))
- Code of Conduct ([#399](https://github.com/stac-utils/pystac/pull/399))
- `ItemCollection` class for working with GeoJSON FeatureCollections containing only
STAC Items ([#430](https://github.com/stac-utils/pystac/pull/430))

### Changed

Expand Down
8 changes: 8 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,14 @@ CommonMetadata
:members:
:undoc-members:

ItemCollection
--------------
Represents a GeoJSON FeatureCollection in which all Features are STAC Items

.. autoclass:: pystac.ItemCollection
:members:
:show-inheritance:

Links
-----

Expand Down
60 changes: 40 additions & 20 deletions pystac/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
STACValidationError,
)

from typing import Any, Dict, Optional
from typing import Any, Dict, Optional, Union
from pystac.version import (
__version__,
get_stac_version,
Expand All @@ -34,6 +34,7 @@
)
from pystac.summaries import RangeSummary
from pystac.item import Item, Asset, CommonMetadata
from pystac.item_collection import ItemCollection

import pystac.validation

Expand Down Expand Up @@ -71,7 +72,7 @@
)


def read_file(href: str) -> STACObject:
def read_file(href: str) -> Union[STACObject, ItemCollection]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to change these top level package read/write methods to include ItemCollection. It breaks user code that may rely on the STACObject type, forcing the type differentiation to happen by the caller. Also the last two parameters of write_file make it feel a bit shoe-horned. On the other hand, I can see people getting confused about why read_file wouldn't work on an ItemCollection if they didn't know it wasn't a core stac object.

I think my preference would be to keep these methods working with core STAC types, and force users to treat ItemCollections as a separate concept, which makes more sense with the spec as it currently stands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with all of the points you make here, and I think my preference would be to not change the top-level functions as well. If that ends up being the decision, I will add more clear documentation to the ItemCollection class indicating that it is not a STACObject and cannot be read using those top-level methods.

@scottyhq I'm curious how much of a priority it is for you to be able to read Item Collections using pystac.read_file vs. ItemCollection.from_file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this! Yes, I'm coming at this primarily from someone new to STAC, who is unfamiliar with spec details. The key expectation I have is that after searching a STAC API for data, and saving 'results.json' to work with later, there is an straightforward way to open that file and navigate it. For Python it seems like pystac_client takes care of the searching and pystac should care of the I/O and navigation of the results. I don't think people should have to understand the concepts of ItemCollections versus Collections, Core vs Not, for this fundamental workflow.

So if pystac.read_file can't handle ItemCollections, and a separate ItemCollection.from_file() is the way forward (or pystac_client.read_file()`?), I think that just needs to be clearly documented.

Also useful (and I think a non-breaking change) would then be for pystac.read_file to have error handing that can recognize the json is an ItemCollection and suggest the correct method to open it, rather than the current KeyError: 'id' ?

Copy link
Contributor Author

@duckontheweb duckontheweb Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scottyhq, that all makes sense to me. I'll change those top-level functions back to their original signatures and make sure we have clear docs on how to work with ItemCollections. I'm pretty sure the issue with a KeyError being raised in pystac.read_file is fixed by #402, but I'll add a test to be sure.

@matthewhanson Looking back at the code example in the original issue it seems like the name of the ItemSearch.items_as_collection method might also be misleading. Maybe we should rename that to items_as_item_collection so that users don't think they are saving a STAC Collection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed ItemCollection handling from top-level functions and added test that pystac.read_file raises a STACTypeError instead of the KeyError in 404bb99

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duckontheweb ItemSearch.items_as_collection was misleading. I've renamed it already, there's now get_pages (get the raw JSON of the pages), get_item_collection (gets pages as item collections), get_items (iterator through all pages and items), and get_all_items (gets all items from all pages and returns as single ItemCollection). This matches the get_ syntax used in PySTAC

"""Reads a STAC object from a file.

This method will return either a Catalog, a Collection, or an Item based on what the
Expand All @@ -86,11 +87,16 @@ def read_file(href: str) -> STACObject:
The specific STACObject implementation class that is represented
by the JSON read from the file located at HREF.
"""
return STACObject.from_file(href)
try:
return STACObject.from_file(href)
except STACTypeError:
return ItemCollection.from_file(href)


def write_file(
obj: STACObject, include_self_link: bool = True, dest_href: Optional[str] = None
obj: Union[STACObject, ItemCollection],
include_self_link: bool = True,
dest_href: Optional[str] = None,
) -> None:
"""Writes a STACObject to a file.

Expand All @@ -106,38 +112,52 @@ def write_file(

Args:
obj : The STACObject to save.
include_self_link : If this is true, include the 'self' link with this object.
Otherwise, leave out the self link.
dest_href : Optional HREF to save the file to. If None, the object will be saved
to the object's self href.
include_self_link : If ``True``, include the ``"self"`` link with this object.
Otherwise, leave out the self link. Ignored for :class:~ItemCollection`
instances.
dest_href : Optional HREF to save the file to. If ``None``, the object will be
saved to the object's ``"self"`` href (for :class:`~STACObject` sub-classes)
or a :exc:`~STACError` will be raised (for :class:`~ItemCollection`
instances).
"""
obj.save_object(include_self_link=include_self_link, dest_href=dest_href)
if isinstance(obj, ItemCollection):
if dest_href is None:
raise STACError("Must provide dest_href when saving and ItemCollection.")
obj.save_object(dest_href=dest_href)
else:
obj.save_object(include_self_link=include_self_link, dest_href=dest_href)


def read_dict(
d: Dict[str, Any],
href: Optional[str] = None,
root: Optional[Catalog] = None,
stac_io: Optional[StacIO] = None,
) -> STACObject:
"""Reads a STAC object from a dict representing the serialized JSON version of the
STAC object.
) -> Union[STACObject, ItemCollection]:
"""Reads a :class:`~STACObject` or :class:`~ItemCollection` from a JSON-like dict
representing a serialized STAC object.

This method will return either a Catalog, a Collection, or an Item based on what the
dict contains.
This method will return either a :class:`~Catalog`, :class:`~Collection`,
:class`~Item`, or :class:`~ItemCollection` based on the contents of the dict.

This is a convenience method for :meth:`pystac.serialization.stac_object_from_dict`
This is a convenience method for either
:meth:`stac_io.stac_object_from_dict <stac_io.stac_object_from_dict>` or
:meth:`ItemCollection.from_dict <ItemCollection.from_dict>`.

Args:
d : The dict to parse.
href : Optional href that is the file location of the object being
parsed.
parsed. Ignored if the dict represents an :class:`~ItemCollection`.
root : Optional root of the catalog for this object.
If provided, the root's resolved object cache can be used to search for
previously resolved instances of the STAC object.
stac_io: Optional StacIO instance to use for reading. If None, the
default instance will be used.
previously resolved instances of the STAC object. Ignored if the dict
represents an :class:`~ItemCollection`.
stac_io: Optional :class:`~StacIO` instance to use for reading. If ``None``,
the default instance will be used.
"""
if stac_io is None:
stac_io = StacIO.default()
return stac_io.stac_object_from_dict(d, href, root)
try:
return stac_io.stac_object_from_dict(d, href, root)
except STACTypeError:
return ItemCollection.from_dict(d)
125 changes: 125 additions & 0 deletions pystac/item_collection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
from copy import deepcopy
from pystac.errors import STACTypeError
from typing import Any, Dict, Iterator, List, Optional, Sized, Iterable

import pystac
from pystac.utils import make_absolute_href, is_absolute_href
from pystac.serialization.identify import identify_stac_object_type


class ItemCollection(Sized, Iterable[pystac.Item]):
"""Implementation of a GeoJSON FeatureCollection whose features are all STAC
Items.

All :class:`~pystac.Item` instances passed to the :class:`~ItemCollection` instance
during instantiation are cloned and have their ``"root"`` URL cleared. Instances of
this class are iterable and sized (see examples below).

Any additional top-level fields in the FeatureCollection are retained in
:attr:`~ItemCollection.extra_fields` by the :meth:`~ItemCollection.from_dict` and
:meth:`~ItemCollection.from_file` methods and will be present in the serialized file
from :meth:`~ItemCollection.save_object`.

Examples:

Loop over all items in the ItemCollection

>>> item_collection: ItemCollection = ...
>>> for item in item_collection:
... ...

Get the number of Items in the ItemCollection

>>> length: int = len(item_collection)

"""

items: List[pystac.Item]
"""The list of :class:`pystac.Item` instances contained in this
``ItemCollection``."""

extra_fields: Dict[str, Any]
"""Dictionary containing additional top-level fields for the GeoJSON
FeatureCollection."""

def __init__(
self, items: List[pystac.Item], extra_fields: Optional[Dict[str, Any]] = None
):
self.items = [item.clone() for item in items]
for item in self.items:
item.clear_links("root")
self.extra_fields = extra_fields or {}

def __getitem__(self, idx: int) -> pystac.Item:
return self.items[idx]

def __iter__(self) -> Iterator[pystac.Item]:
return iter(self.items)

def __len__(self) -> int:
return len(self.items)

def to_dict(self) -> Dict[str, Any]:
"""Serializes an :class:`ItemCollection` instance to a JSON-like dictionary."""
return {
"type": "FeatureCollection",
"features": [item.to_dict() for item in self.items],
**self.extra_fields,
}

def clone(self) -> "ItemCollection":
"""Creates a clone of this instance. This clone is a deep copy; all
:class:`~pystac.Item` instances are cloned and all additional top-level fields
are deep copied."""
return self.__class__(
items=[item.clone() for item in self.items],
extra_fields=deepcopy(self.extra_fields),
)

@classmethod
def from_dict(cls, d: Dict[str, Any]) -> "ItemCollection":
"""Creates a :class:`ItemCollection` instance from a dictionary."""
if identify_stac_object_type(d) != pystac.STACObjectType.ITEMCOLLECTION:
raise STACTypeError("Dict is not a valid ItemCollection")

items = [pystac.Item.from_dict(item) for item in d.get("features", [])]
extra_fields = {k: v for k, v in d.items() if k not in ("features", "type")}

return cls(items=items, extra_fields=extra_fields)

@classmethod
def from_file(
cls, href: str, stac_io: Optional[pystac.StacIO] = None
) -> "ItemCollection":
"""Reads a :class:`ItemCollection` from a JSON file.

Arguments:
href : Path to the file.
stac_io : A :class:`~pystac.StacIO` instance to use for file I/O
"""
if stac_io is None:
stac_io = pystac.StacIO.default()

if not is_absolute_href(href):
href = make_absolute_href(href)

d = stac_io.read_json(href)

return cls.from_dict(d)

def save_object(
self,
dest_href: str,
stac_io: Optional[pystac.StacIO] = None,
) -> None:
"""Saves this instance to the ``dest_href`` location.

Args:
dest_href : Location to which the file will be saved.
stac_io: Optional :class:`~pystac.StacIO` instance to use. If not provided,
will use the default instance.
"""
if stac_io is None:
stac_io = pystac.StacIO.default()

stac_io.save_json(dest_href, self.to_dict())
2 changes: 1 addition & 1 deletion pystac/serialization/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,4 @@ def stac_object_from_dict(
if info.object_type == pystac.STACObjectType.ITEM:
return pystac.Item.from_dict(d, href=href, root=root, migrate=False)

raise ValueError(f"Unknown STAC object type {info.object_type}")
raise pystac.STACTypeError(f"Unknown STAC object type {info.object_type}")
7 changes: 4 additions & 3 deletions tests/data-files/change_stac_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,10 @@ def migrate(path: str) -> None:
)
)
obj = pystac.read_dict(stac_json, href=path)
migrated = obj.to_dict(include_self_link=False)
with open(path, "w") as f:
json.dump(migrated, f, indent=2)
if not isinstance(obj, pystac.ItemCollection):
migrated = obj.to_dict(include_self_link=False)
with open(path, "w") as f:
json.dump(migrated, f, indent=2)


if __name__ == "__main__":
Expand Down
Loading