Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality of pystac objects? #380

Open
TomAugspurger opened this issue May 31, 2021 · 5 comments
Open

Equality of pystac objects? #380

TomAugspurger opened this issue May 31, 2021 · 5 comments
Labels
discussion An issue to capture a discussion enhancement
Milestone

Comments

@TomAugspurger
Copy link
Collaborator

TomAugspurger commented May 31, 2021

In writing tests for #379, I stumbled over Asset objects not defining __eq__.

In [1]: import pystac

In [2]: pystac.Asset(href='a') == pystac.Asset(href='a')
Out[2]: False

Looking briefly, I don't see it for any of the core items (a couple extensions define __eq__). The behavior of Python is to fall back to object identity, (a is b).

Should pystac objects define __eq__ (and __ne__)? An implementation like the following probably makes sense:

def __eq__(self, other):
    return type(self) == type(other) and self.to_dict() == other.to_dict()

If we define __eq__ then we may need to make the objects unhashable, since the objects are mutable. https://hynek.me/articles/hashes-and-equality/ is a good read on this.

@duckontheweb duckontheweb added discussion An issue to capture a discussion enhancement labels Jun 1, 2021
@duckontheweb
Copy link
Contributor

I'm in favor of defining equality for PySTAC objects.

As I started working on an ItemCollection class for #371 I realized it would be nice to be able to define __contains__ on that class to check if an ItemCollection contains an Item. This is currently possible because I am cloning each Item during ItemCollection initialization so that we can reset the root link without mutating the original object. Since the default equality measure is item_a is item_b, as @TomAugspurger mentioned, this always comes back False.

I can tackle this as part of the work on #371. It sounds like we should raise a NotImplementedError for values of other that cannot be compared, so I'm thinking this should be the implementation:

def __eq__(self, other: object) -> bool:
    if type(self) != type(other):
        raise NotImplementedError(f"Type of other must be {type(self)}"
    return self.to_dict() == other.to_dict()

def __ne__(self, other: object) -> bool:
    if type(self) != type(other):
        raise NotImplementedError(f"Type of other must be {type(self)}"
    return self.to_dict() != other.to_dict()

That article mentions that Python 3 automatically makes objects unhashable if you define __eq__ and not __hash__, but I'll double-check that.

@TomAugspurger
Copy link
Collaborator Author

It sounds like we should raise a NotImplementedError for values of other that cannot be compared, so I'm thinking this should be the implementation:

I haven't read closely, but I want to confirm: do you want raise NotImplementedError or return NotImplemented. The (subtle) difference is that return NotImplemented allows the other object to determine if it's equal or not. I think if both return NotImplemented then Python raises a TypeError.

How to handle "foreign" objects is a really tricky subject, but I think the best practice is to only consider your own types and to return NotImplemented for all the others.

@duckontheweb
Copy link
Contributor

do you want raise NotImplementedError or return NotImplemented.

Yes, thanks for pointing that out, I misread that. We want to return NotImplemented if it's "foreign".

@duckontheweb
Copy link
Contributor

It might be worth a little more discussion on how we want to specifically define equality. I'm realizing now that the above definition won't necessarily work in an ItemCollection since we are removing any root links for Items when we initialize the ItemCollection, which makes the dicts unequal.

It seems like there could be a use-case for considering otherwise identical Items that belong to different Catalogs or Collections to be equal. In that case, if the "collection" property and the "parent", "root", and "collection" links were different, but all other properties were the same the Items would be considered equal. There might be a similar use-case for Collections that are included in multiple APIs or Catalogs.

@duckontheweb
Copy link
Contributor

cc: @gadomski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion An issue to capture a discussion enhancement
Projects
None yet
Development

No branches or pull requests

3 participants