Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring code coverage back up to 93-94% #331

Closed
lossyrob opened this issue May 4, 2021 · 19 comments
Closed

Bring code coverage back up to 93-94% #331

lossyrob opened this issue May 4, 2021 · 19 comments
Assignees
Milestone

Comments

@lossyrob
Copy link
Member

lossyrob commented May 4, 2021

The changes in #309 will cause the code coverage to dip to 88-89%. This issue is to ensure we bring back coverage to the 93-94% range for the PySTAC 1.0 release.

@duckontheweb
Copy link
Contributor

#324 and #325 will probably go a long way towards this given the low coverage of both of those modules. We may want to wait to tackle this issue until those are complete and we have a better sense of the gap.

@duckontheweb
Copy link
Contributor

We should also update the coverage badge link the main readme to point to the main branch instead of develop as part of this issue.

@cholmes cholmes added this to the 1.0.0 milestone May 24, 2021
@l0b0 l0b0 mentioned this issue May 27, 2021
4 tasks
@l0b0
Copy link
Contributor

l0b0 commented Jun 3, 2021

I'd be interested in helping with this. I recently brought the coverage in another project from 97% to 100%, and it was surprising how many issues that uncovered (PR 1, 2, 3, 4).

@duckontheweb
Copy link
Contributor

We would definitely appreciate the help on this!

One thing we may want to consider is whether we want to include testing of methods like __repr__ and __str__ in the coverage stats. I tend to feel like don't add much so I would be fine with excluding them, but I don't really feel strongly about it.

@l0b0
Copy link
Contributor

l0b0 commented Jun 4, 2021

One thing we may want to consider is whether we want to include testing of methods like __repr__ and __str__ in the coverage stats. I tend to feel like don't add much so I would be fine with excluding them, but I don't really feel strongly about it.

I feel a bit like something's wrong if none of the tests end up exercising these - usually that means they aren't needed and should be removed. But you can make a call on that once we get closer to 100%.

In a separate project we've ended up with the following excludes:

  • if TYPE_CHECKING: since no tests run during type checking
  • if __name__ == "__main__": since it can be assumed that this is run on the command line, and that the contents of this branch is tiny (typically just main())
  • pragma: no cover to exclude a few pieces of test code which don't always run

@duckontheweb
Copy link
Contributor

I feel a bit like something's wrong if none of the tests end up exercising these - usually that means they aren't needed and should be removed. But you can make a call on that once we get closer to 100%.

That's a good point. If they're important enough to add we should probably test them 😄 . We can see how it shakes out as we get closer.

@duckontheweb
Copy link
Contributor

@l0b0 If you're okay with it, I'll assign this issue to you.

@l0b0
Copy link
Contributor

l0b0 commented Jun 19, 2021

I've got started on this work, and as usual when doing something like this, questions about the rationale behind any unusual code comes up. I noticed a bunch of the classes have custom __repr__ implementations, such as ItemSarExtension:

    def __repr__(self) -> str:
        return "<ItemSarExtension Item id={}>".format(self.item.id)

Based on the implementation above, repr(ItemSarExtension(first_item)) == repr(ItemSarExtension(second_item)) if and only if first_item.id == second_item.id, even if the other properties on first_item and second_item are different. As far as I can tell, there are two different schools of thought about which requirements the output of __repr__ should fulfil. One is that the output should be unique based on the properties of the object. That is, repr(ItemSarExtension(first_item)) == repr(ItemSarExtension(second_item)) should imply that repr(first_item) == repr(second_item) and repr(first_item.id) == repr(second_item.id) and so on for each of the properties of first_item and second_item, recursively. The second is that the output should contain enough information to recreate the object manually, basically making it a human readable representation of the object and all its properties, recursively. This implementation fulfils neither of the two, which could hide bugs when trying to work out for example why two seemingly identical objects behave differently.

In addition, I could only find a single place where repr() is used (in RequiredPropertyMissing), so I wonder if we could get rid of most or all of these.

I'll just skip adding coverage for __repr__ methods for now.

@l0b0
Copy link
Contributor

l0b0 commented Jun 19, 2021

Another question: Why is the orjson package optional? Presumably anyone using pystac will have to install some packages which are required, so why not make orjson mandatory and simplify the implementation, or revert to using the core json module?

Ping @duckontheweb?

@l0b0 l0b0 mentioned this issue Jun 19, 2021
5 tasks
@duckontheweb
Copy link
Contributor

Another question: Why is the orjson package optional? Presumably anyone using pystac will have to install some packages which are required, so why not make orjson mandatory and simplify the implementation, or revert to using the core json module?

orjson gives us a pretty significant performance boost for JSON serialization/deserialization over the other JSON libraries out there, and given the amount of JSON work involved in manipulating STAC catalogs it seems worth it to continue to include it as a dependency.

However, when no wheel is available (e.g. for orjson==3.5.3 on Python 3.10) building a wheel requires using the maturin build tool on the Rust nightly channel, which seems like too big a requirement for a Python project. In order to continue supporting users who may be installing in environments without an orjson wheel available, I think we want to keep this as an extra.

@duckontheweb
Copy link
Contributor

Re: whether we should remove custom __repr__ implementations...

@lossyrob and others who have been involved in the project for longer may be able to provide more context on this, but I think the intention was to provide simple representation of STAC objects that could be used to eyeball the items returned by something like Catalog.get_items without overwhelming a user with too much information. I agree that the current implementations would not be very useful in debugging problems with seemingly equal objects.

If we do want to take one of the 2 approaches detailed in the original issue, I would lean towards making it unique based on the STAC object properties. Providing enough information to completely reconstruct the object might be overwhelming given how complex STAC objects can be. We may also want to align our definition of "uniqueness" here with the outcome of the discussion in #380 on defining equality of STAC objects.

@l0b0
Copy link
Contributor

l0b0 commented Jun 24, 2021

However, when no wheel is available (e.g. for orjson==3.5.3 on Python 3.10) building a wheel requires using the maturin build tool on the Rust nightly channel, which seems like too big a requirement for a Python project. In order to continue supporting users who may be installing in environments without an orjson wheel available, I think we want to keep this as an extra.

I wouldn't want to introduce Rust nightly into the pipeline, but Python 3.10 is not stable yet though. Isn't it reasonable to expect orjson to be available for 3.10 shortly after that? That is, could we make orjson required and not test on 3.10 until it's available there?

@l0b0
Copy link
Contributor

l0b0 commented Jun 24, 2021

@lossyrob and others who have been involved in the project for longer may be able to provide more context on this, but I think the intention was to provide simple representation of STAC objects that could be used to eyeball the items returned by something like Catalog.get_items without overwhelming a user with too much information.

I thought that might be the case, but I don't know how useful that is in practice. I've not worked on any other projects where overriding __repr__ was at all common, but then again I might just not have the right debugging smarts. I'd expect if users want to inspect the response from a query they'd check whichever property gives them a reasonable confidence about their findings, whether it's count(Catalog.get_items()), for item in Catalog.get_items(); print(item.id), or something else.

If we do want to take one of the 2 approaches detailed in the original issue, I would lean towards making it unique based on the STAC object properties. Providing enough information to completely reconstruct the object might be overwhelming given how complex STAC objects can be. We may also want to align our definition of "uniqueness" here with the outcome of the discussion in #380 on defining equality of STAC objects.

Agreed.

@duckontheweb
Copy link
Contributor

However, when no wheel is available (e.g. for orjson==3.5.3 on Python 3.10) building a wheel requires using the maturin build tool on the Rust nightly channel, which seems like too big a requirement for a Python project. In order to continue supporting users who may be installing in environments without an orjson wheel available, I think we want to keep this as an extra.

I wouldn't want to introduce Rust nightly into the pipeline, but Python 3.10 is not stable yet though. Isn't it reasonable to expect orjson to be available for 3.10 shortly after that? That is, could we make orjson required and not test on 3.10 until it's available there?

Yeah, I agree that support for Python 3.10 should be experimental until that version has stabilized. I was mostly using that as a concrete example, but there might be other cases in which a wheel is either not available or not appropriate. I have had experiences working with rasterio where I needed to build from a source distribution and I know it can sometimes be a pain, especially for newcomers. In that case, I think the issues are mostly around linking to system C libraries. I'm not sure if that would be a problem here though. orjson builds using cdylib linkage which should create all of the necessary *.so, *.dylib, etc. files within the compiled package.

I think we should open a separate issue for this to give it some more targeted attention and allow other users/contributors to weigh in on whether they have examples of needing to build from a source distribution. If not, I would be fine with making it a regular dependency.

@lossyrob
Copy link
Member Author

What if someone doesn't want orjson, and instead uses ujson?

PySTAC has explicitly tried to keep any non-essential dependencies out of the library. The ones that have snuck through are python-dateutil (which many argue should be part of stdlib), and recently typing_extensions for pre-3.8 Python versions. I think this is a good value to have as a project like PySTAC that aims to be a core type library. Downstream library or application developers don't need to fret over the install footprint, dependency conflicts, or installing dependencies they don't want - it takes away that need for decision making when considering using the library. I think holding that as as a project value has benefited the project in the past and will in the future. To break that for orjson would open the door for other dependencies, and I'd rather hold the line and work with things being optional.

That said, taking advantage of libraries that are there for things like performance speedups is also beneficial, and I don't think costs us much complexity. JSON reading is so core to PySTAC that I don't imaging there would be many things like this. But reverting to just using stdlib json in systems that are using orjson would slow things down considerably, and the speedup is worth the complexity of the conditional. One might ask, why not implement a ujson code path as well then? I think if someone had that specific use case and made a PR to add similar functionality with ujson, I'd likely be in favor. Perhaps we'd want to consider refactoring a bit to make things cleaner with two implementations, but if it doesn't add an explicit dependency, doesn't add too much complexity, and is valuable to our userbase, that would be a good change.

@l0b0
Copy link
Contributor

l0b0 commented Jun 24, 2021

I was mostly using that as a concrete example, but there might be other cases in which a wheel is either not available or not appropriate.

The availability issue at least should already be solved by the pipelines running on all supported platforms and Python versions. But as you both point out it might not be appropriate to require a less widely available and more complex library.

We should probably make sure the pipeline runs all tests both with and without orjson though (maybe install requirements-test.txt except orjson first, then install the whole dependency list?). #475

@duckontheweb
Copy link
Contributor

@l0b0 Whenever you feel like the test coverage has stabilized in the 93-94% range we can close this and open more specific issues to address other gaps in coverage. Thanks so much for your work on this! It can be really thankless behind-the-scenes work to focus on test coverage and typing issues, but it's really important in creating a stable reliable code base. We really appreciate it!

@l0b0
Copy link
Contributor

l0b0 commented Jul 9, 2021

Thank you as well! The constructive feedback has been great, especially the way you've all let me know gently when I've misunderstood some intent.

Sounds good to me, by the way. I'm happy to jump to a different issue to bring coverage up to 100%. I already filed a few issues discovered so far, and there's probably going to be some more interesting discussions before we get there.

By the way, Toitū Te Whenua LINZ is really keen to simplify metadata handling by moving to STAC, and pystac is probably going to be instrumental for that work.

Disclaimer: All that jazz about not representing my employer, etc :) Have a good weekend!

@duckontheweb
Copy link
Contributor

Awesome! I'll close this out, thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants