Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ItemCollection constructor should clone items in iterator by default #859

Closed
philvarner opened this issue Jul 27, 2022 · 7 comments · Fixed by #1016
Closed

ItemCollection constructor should clone items in iterator by default #859

philvarner opened this issue Jul 27, 2022 · 7 comments · Fixed by #1016
Milestone

Comments

@philvarner
Copy link
Collaborator

Currently, if an ItemCollection is constructed with an iterator of Item objects, those objects may be mangled and become unusable by operation in ItemCollection. This can be avoided by setting the clone parameter to True. This change would make the clone parameter default to True to have correct behavior by default, and allow a client to set clone to False if they desired the higher performance of not cloning the Items.

@jsignell
Copy link
Member

Not mutating the input seems like a good idea to me. Does this kind of change require a deprecation warning or is it ok to just change the default in a minor release?

@gadomski
Copy link
Member

Tl;dr: it's ok to do this in a MINOR (I think).

We don't have a published deprecation/semver policy (TODO, I sort of started one here but it needs more work) but I think this would be acceptable in a MINOR. We can use our benchmark suite to quantify any performance impacts; I would expect them to be relatively minimal, especially relative to io. Anyone relying on the mutating behavior should be explicitly setting clone=False, so this wouldn't break them.

@jsignell
Copy link
Member

Ok! I am taking a look at this now.

@jsignell
Copy link
Member

jsignell commented Feb 28, 2023

Hmm ok so I think we will want to change how we handle equivalence. I changed the default and am getting failures on this test:

    def test_add_item_collections(self) -> None:
        item_1 = pystac.Item.from_file(self.SIMPLE_ITEM)
        item_2 = pystac.Item.from_file(self.EXTENDED_ITEM)
        item_3 = pystac.Item.from_file(self.CORE_ITEM)

        item_collection_1 = pystac.ItemCollection(items=[item_1, item_2])
        item_collection_2 = pystac.ItemCollection(items=[item_2, item_3])

        combined = item_collection_1 + item_collection_2

        self.assertEqual(len(combined), 3)

Because item_2 in item_collection_1 != item_2 in item_collection2. This makes total sense since they are clones of each other, but I'm thinking we might want to change __eq__ to check if the serialize to the same output. This is what we do for many of the other classes.

@gadomski
Copy link
Member

@gadomski
Copy link
Member

Setting equality aside for a second, I'm not sure I agree that ItemCollection should behave like a set and dedup its contained objects. I don't disagree -- I'm just not sure.

@jsignell
Copy link
Member

Setting equality aside for a second, I'm not sure I agree that ItemCollection should behave like a set and dedup its contained objects. I don't disagree -- I'm just not sure.

Oh good point! I think I would prefer that it not dedup, but the worst case is if it did different things based on the clone_items kwarg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants