Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Casting root Catalog to Catalog type interferes with pystac-client Client class #410

Closed
matthewhanson opened this issue Jun 5, 2021 · 7 comments
Assignees
Milestone

Comments

@matthewhanson
Copy link
Member

The pystac-client Client class inherits from Catalog. In the latest version some logic was updated so that when it is read from a file (pystac-client uses from_file to initially open a root API), it sees that it is a root catalog and casts itself to a Catalog.

See here:
https://github.com/stac-utils/pystac/blob/main/pystac/stac_object.py#L484

The .from_file is the most immediate issue. There is casting to Catalog in other cases, such as if you item.get_root() it casts it to a catalog, but this didn't work before anyway, because PySTAC resolves STAC object links with logic that doesn't know about pystac-client's Client class.

@matthewhanson matthewhanson added this to the 1.0.0-rc.1 milestone Jun 5, 2021
@matthewhanson
Copy link
Member Author

This doesn't necessarily need to be changed in PySTAC, because pystac-client could just redefine the from_file method without the casting, but wanted to raise it here as it will affect any subclass of Catalog.

I think in general it's not good practice to force cast the returned object from a factory function, as it means any subclass won't be able to use those functions.

@matthewhanson
Copy link
Member Author

matthewhanson commented Jun 5, 2021

Looks like I was premature in my conclusion.

Casting to Catalog doesn't seem to affect the current object, only the root it points to (so it's a deepcopy?).

The problem is actually several lines up where these lines:

d = STAC_IO.read_json(href)
if cls == STACObject:
    o = STAC_IO.stac_object_from_dict(d, href=href)
else:
    o = cls.from_dict(d, href=href)

were replaced with the general function

o = stac_io.read_stac_object(href)

which doesn't call the current class constructor and instead can only create STAC objects defined in PySTAC.

It should be something like

        if cls == STACObject:
            o = stac_io.read_stac_object(href)
        else:
            d = stac_io.read_json(href)
            o = cls.from_dict(d, href=href)
            o._stac_io = stac_io

@duckontheweb
Copy link
Contributor

It should be something like

        if cls == STACObject:
            o = stac_io.read_stac_object(href)
        else:
            d = stac_io.read_json(href)
            o = cls.from_dict(d, href=href)
            o._stac_io = stac_io

Since STACObject is an abstract class that we never instantiate directly, this would always bypass the if cls == STACObject block and go to else. We would have to do something like if cls in (Catalog, Collection, Item), which seems prone to problems if we add classes that PySTAC recognizes (e.g. #430) and may cause circular import issues (I haven't actually tested that).

The only other way I can see to fix this is to add something like an optional typ argument to StacIO.read_stac_object that would contain the calling cls from STACObject.from_file. That method would then pass typ to StacIO.stac_object_from_dict and then on to serialization.stac_object_from_dict, which we could change to be something like:

def stac_object_from_dict(
    d: Dict[str, Any],
    href: Optional[str] = None,
    root: Optional["Catalog"] = None,
    typ: Optional[STACObject] = None,
) -> "STACObject":
    if typ is not None:
        return typ.from_dict(d, href=href, root=root, migrate=False  # ...or should migrate=True?

    # Existing logic here...

This may also be a good chance to consolidate and simplify some of that code. serialization.stac_object_from_dict, for example, is only used in StacIO and the deprecated STAC_IO, so maybe it makes sense to move that into the stac_io module instead.

@matthewhanson
Copy link
Member Author

There is one case where STACObject is not an abstract class, and that is when classmethod functions are called directly as members of the class rather than an instantiated object. This is what happens when the pystac.read_file function is called. This is a convenience function, but one that implies the program that calls it doesn't know what it's opening other than it's a STAC entity. In this case the only type of class that is supported for opening is what is explicitly supported in pystac.serialization.stac_object_from_dict, that is: Catalog, Collection, Item. Note that this function cannot be modified by StacIO since it lies a different module. In all other cases cls will resolve to the actual STACObject which will define it's own from_dict method.

To call pystac.read_file() function is to make 6 function calls down to where the STACObject is created:
pystac.read_file() -> STACObject.read_file() -> StacIO.read_stac_object() -> StacIO.stac_object_from_dict() -> pystac.serialization.stac_object_from_dict() -> <classname>.from_dict()

Its easy enough to redefine a higher level functions, but a main takeaway is that creating a StacIO subclass is rather complicated, and it's not really clear what functions should be overridden in which cases since there are so many nested functions....the above doesn't even include the super() function calls(). I don't even know if the right thing has been done in the StacApiIO implementation for pystac-client.

Everything works, and I don't want to derail the great momentum here, but it may be stepping back and looking at the IO approach to see if some functionality can be consolidated.

@duckontheweb
Copy link
Contributor

Yes, thanks, I was thinking more a out how that would play out in the various STACObject sub-classes and forgot that we were calling it as a class method directly on STACObject in some cases.

I completely agree with the broader point here. It does seem like we need to re-examine the IO approach and make sure we're supporting reasonable inheritance patterns.

lossyrob added a commit that referenced this issue Jun 17, 2021
@duckontheweb
Copy link
Contributor

@matthewhanson Does #451 close this, or are there additional changes that are needed?

@duckontheweb
Copy link
Contributor

Closed via #451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants