Bring code coverage back up to 93-94% #331

lossyrob · 2021-05-04T14:57:14Z

The changes in #309 will cause the code coverage to dip to 88-89%. This issue is to ensure we bring back coverage to the 93-94% range for the PySTAC 1.0 release.

duckontheweb · 2021-05-17T19:51:15Z

#324 and #325 will probably go a long way towards this given the low coverage of both of those modules. We may want to wait to tackle this issue until those are complete and we have a better sense of the gap.

duckontheweb · 2021-05-24T13:26:20Z

We should also update the coverage badge link the main readme to point to the main branch instead of develop as part of this issue.

l0b0 · 2021-06-03T01:45:37Z

I'd be interested in helping with this. I recently brought the coverage in another project from 97% to 100%, and it was surprising how many issues that uncovered (PR 1, 2, 3, 4).

duckontheweb · 2021-06-04T00:50:33Z

We would definitely appreciate the help on this!

One thing we may want to consider is whether we want to include testing of methods like __repr__ and __str__ in the coverage stats. I tend to feel like don't add much so I would be fine with excluding them, but I don't really feel strongly about it.

l0b0 · 2021-06-04T01:42:17Z

One thing we may want to consider is whether we want to include testing of methods like __repr__ and __str__ in the coverage stats. I tend to feel like don't add much so I would be fine with excluding them, but I don't really feel strongly about it.

I feel a bit like something's wrong if none of the tests end up exercising these - usually that means they aren't needed and should be removed. But you can make a call on that once we get closer to 100%.

In a separate project we've ended up with the following excludes:

if TYPE_CHECKING: since no tests run during type checking
if __name__ == "__main__": since it can be assumed that this is run on the command line, and that the contents of this branch is tiny (typically just main())
pragma: no cover to exclude a few pieces of test code which don't always run

duckontheweb · 2021-06-11T03:06:08Z

I feel a bit like something's wrong if none of the tests end up exercising these - usually that means they aren't needed and should be removed. But you can make a call on that once we get closer to 100%.

That's a good point. If they're important enough to add we should probably test them 😄 . We can see how it shakes out as we get closer.

duckontheweb · 2021-06-11T03:06:36Z

@l0b0 If you're okay with it, I'll assign this issue to you.

l0b0 · 2021-06-19T04:16:19Z

I've got started on this work, and as usual when doing something like this, questions about the rationale behind any unusual code comes up. I noticed a bunch of the classes have custom __repr__ implementations, such as ItemSarExtension:

    def __repr__(self) -> str:
        return "<ItemSarExtension Item id={}>".format(self.item.id)

Based on the implementation above, repr(ItemSarExtension(first_item)) == repr(ItemSarExtension(second_item)) if and only if first_item.id == second_item.id, even if the other properties on first_item and second_item are different. As far as I can tell, there are two different schools of thought about which requirements the output of __repr__ should fulfil. One is that the output should be unique based on the properties of the object. That is, repr(ItemSarExtension(first_item)) == repr(ItemSarExtension(second_item)) should imply that repr(first_item) == repr(second_item) and repr(first_item.id) == repr(second_item.id) and so on for each of the properties of first_item and second_item, recursively. The second is that the output should contain enough information to recreate the object manually, basically making it a human readable representation of the object and all its properties, recursively. This implementation fulfils neither of the two, which could hide bugs when trying to work out for example why two seemingly identical objects behave differently.

In addition, I could only find a single place where repr() is used (in RequiredPropertyMissing), so I wonder if we could get rid of most or all of these.

I'll just skip adding coverage for __repr__ methods for now.

l0b0 · 2021-06-19T04:42:26Z

Another question: Why is the orjson package optional? Presumably anyone using pystac will have to install some packages which are required, so why not make orjson mandatory and simplify the implementation, or revert to using the core json module?

Ping @duckontheweb?

duckontheweb · 2021-06-24T03:08:31Z

Another question: Why is the orjson package optional? Presumably anyone using pystac will have to install some packages which are required, so why not make orjson mandatory and simplify the implementation, or revert to using the core json module?

orjson gives us a pretty significant performance boost for JSON serialization/deserialization over the other JSON libraries out there, and given the amount of JSON work involved in manipulating STAC catalogs it seems worth it to continue to include it as a dependency.

However, when no wheel is available (e.g. for orjson==3.5.3 on Python 3.10) building a wheel requires using the maturin build tool on the Rust nightly channel, which seems like too big a requirement for a Python project. In order to continue supporting users who may be installing in environments without an orjson wheel available, I think we want to keep this as an extra.

duckontheweb · 2021-06-24T03:21:41Z

Re: whether we should remove custom __repr__ implementations...

@lossyrob and others who have been involved in the project for longer may be able to provide more context on this, but I think the intention was to provide simple representation of STAC objects that could be used to eyeball the items returned by something like Catalog.get_items without overwhelming a user with too much information. I agree that the current implementations would not be very useful in debugging problems with seemingly equal objects.

If we do want to take one of the 2 approaches detailed in the original issue, I would lean towards making it unique based on the STAC object properties. Providing enough information to completely reconstruct the object might be overwhelming given how complex STAC objects can be. We may also want to align our definition of "uniqueness" here with the outcome of the discussion in #380 on defining equality of STAC objects.

l0b0 · 2021-06-24T04:35:44Z

However, when no wheel is available (e.g. for orjson==3.5.3 on Python 3.10) building a wheel requires using the maturin build tool on the Rust nightly channel, which seems like too big a requirement for a Python project. In order to continue supporting users who may be installing in environments without an orjson wheel available, I think we want to keep this as an extra.

I wouldn't want to introduce Rust nightly into the pipeline, but Python 3.10 is not stable yet though. Isn't it reasonable to expect orjson to be available for 3.10 shortly after that? That is, could we make orjson required and not test on 3.10 until it's available there?

l0b0 · 2021-06-24T04:44:31Z

@lossyrob and others who have been involved in the project for longer may be able to provide more context on this, but I think the intention was to provide simple representation of STAC objects that could be used to eyeball the items returned by something like Catalog.get_items without overwhelming a user with too much information.

I thought that might be the case, but I don't know how useful that is in practice. I've not worked on any other projects where overriding __repr__ was at all common, but then again I might just not have the right debugging smarts. I'd expect if users want to inspect the response from a query they'd check whichever property gives them a reasonable confidence about their findings, whether it's count(Catalog.get_items()), for item in Catalog.get_items(); print(item.id), or something else.

If we do want to take one of the 2 approaches detailed in the original issue, I would lean towards making it unique based on the STAC object properties. Providing enough information to completely reconstruct the object might be overwhelming given how complex STAC objects can be. We may also want to align our definition of "uniqueness" here with the outcome of the discussion in #380 on defining equality of STAC objects.

Agreed.

duckontheweb · 2021-06-24T14:01:16Z

However, when no wheel is available (e.g. for orjson==3.5.3 on Python 3.10) building a wheel requires using the maturin build tool on the Rust nightly channel, which seems like too big a requirement for a Python project. In order to continue supporting users who may be installing in environments without an orjson wheel available, I think we want to keep this as an extra.

I wouldn't want to introduce Rust nightly into the pipeline, but Python 3.10 is not stable yet though. Isn't it reasonable to expect orjson to be available for 3.10 shortly after that? That is, could we make orjson required and not test on 3.10 until it's available there?

Yeah, I agree that support for Python 3.10 should be experimental until that version has stabilized. I was mostly using that as a concrete example, but there might be other cases in which a wheel is either not available or not appropriate. I have had experiences working with rasterio where I needed to build from a source distribution and I know it can sometimes be a pain, especially for newcomers. In that case, I think the issues are mostly around linking to system C libraries. I'm not sure if that would be a problem here though. orjson builds using cdylib linkage which should create all of the necessary *.so, *.dylib, etc. files within the compiled package.

I think we should open a separate issue for this to give it some more targeted attention and allow other users/contributors to weigh in on whether they have examples of needing to build from a source distribution. If not, I would be fine with making it a regular dependency.

lossyrob · 2021-06-24T14:14:24Z

What if someone doesn't want orjson, and instead uses ujson?

PySTAC has explicitly tried to keep any non-essential dependencies out of the library. The ones that have snuck through are python-dateutil (which many argue should be part of stdlib), and recently typing_extensions for pre-3.8 Python versions. I think this is a good value to have as a project like PySTAC that aims to be a core type library. Downstream library or application developers don't need to fret over the install footprint, dependency conflicts, or installing dependencies they don't want - it takes away that need for decision making when considering using the library. I think holding that as as a project value has benefited the project in the past and will in the future. To break that for orjson would open the door for other dependencies, and I'd rather hold the line and work with things being optional.

That said, taking advantage of libraries that are there for things like performance speedups is also beneficial, and I don't think costs us much complexity. JSON reading is so core to PySTAC that I don't imaging there would be many things like this. But reverting to just using stdlib json in systems that are using orjson would slow things down considerably, and the speedup is worth the complexity of the conditional. One might ask, why not implement a ujson code path as well then? I think if someone had that specific use case and made a PR to add similar functionality with ujson, I'd likely be in favor. Perhaps we'd want to consider refactoring a bit to make things cleaner with two implementations, but if it doesn't add an explicit dependency, doesn't add too much complexity, and is valuable to our userbase, that would be a good change.

l0b0 · 2021-06-24T20:35:40Z

I was mostly using that as a concrete example, but there might be other cases in which a wheel is either not available or not appropriate.

The availability issue at least should already be solved by the pipelines running on all supported platforms and Python versions. But as you both point out it might not be appropriate to require a less widely available and more complex library.

We should probably make sure the pipeline runs all tests both with and without orjson though (maybe install requirements-test.txt except orjson first, then install the whole dependency list?). #475

duckontheweb · 2021-07-08T12:24:12Z

@l0b0 Whenever you feel like the test coverage has stabilized in the 93-94% range we can close this and open more specific issues to address other gaps in coverage. Thanks so much for your work on this! It can be really thankless behind-the-scenes work to focus on test coverage and typing issues, but it's really important in creating a stable reliable code base. We really appreciate it!

l0b0 · 2021-07-09T06:00:20Z

Thank you as well! The constructive feedback has been great, especially the way you've all let me know gently when I've misunderstood some intent.

Sounds good to me, by the way. I'm happy to jump to a different issue to bring coverage up to 100%. I already filed a few issues discovered so far, and there's probably going to be some more interesting discussions before we get there.

By the way, Toitū Te Whenua LINZ is really keen to simplify metadata handling by moving to STAC, and pystac is probably going to be instrumental for that work.

Disclaimer: All that jazz about not representing my employer, etc :) Have a good weekend!

duckontheweb · 2021-07-12T17:24:19Z

Awesome! I'll close this out, thanks again!

lossyrob added the PySTAC 1.0 label May 4, 2021

lossyrob mentioned this issue May 23, 2021

Add more pointcloud extension tests #352

Merged

4 tasks

cholmes added this to the 1.0.0 milestone May 24, 2021

l0b0 mentioned this issue May 27, 2021

Enforce coverage percentage #365

Merged

4 tasks

duckontheweb assigned l0b0 Jun 12, 2021

duckontheweb mentioned this issue Jun 15, 2021

Update docs and tests for Version, Label, and File Info extensions #442

Merged

4 tasks

l0b0 mentioned this issue Jun 19, 2021

Improve coverage checks #466

Closed

5 tasks

duckontheweb mentioned this issue Jun 24, 2021

Experimental Python 3.10 support #473

Merged

5 tasks

This was referenced Jun 29, 2021

feat: Print human readable type name in error messages #498

Merged

Increase test coverage #503

Merged

duckontheweb mentioned this issue Jul 1, 2021

Should more classes have __eq__ and/or __repr__ #222

Open

duckontheweb closed this as completed Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring code coverage back up to 93-94% #331

Bring code coverage back up to 93-94% #331

lossyrob commented May 4, 2021

duckontheweb commented May 17, 2021

duckontheweb commented May 24, 2021

l0b0 commented Jun 3, 2021

duckontheweb commented Jun 4, 2021

l0b0 commented Jun 4, 2021

duckontheweb commented Jun 11, 2021

duckontheweb commented Jun 11, 2021

l0b0 commented Jun 19, 2021 •

edited

Loading

l0b0 commented Jun 19, 2021 •

edited

Loading

duckontheweb commented Jun 24, 2021

duckontheweb commented Jun 24, 2021

l0b0 commented Jun 24, 2021

l0b0 commented Jun 24, 2021

duckontheweb commented Jun 24, 2021

lossyrob commented Jun 24, 2021

l0b0 commented Jun 24, 2021 •

edited

Loading

duckontheweb commented Jul 8, 2021

l0b0 commented Jul 9, 2021

duckontheweb commented Jul 12, 2021

Bring code coverage back up to 93-94% #331

Bring code coverage back up to 93-94% #331

Comments

lossyrob commented May 4, 2021

duckontheweb commented May 17, 2021

duckontheweb commented May 24, 2021

l0b0 commented Jun 3, 2021

duckontheweb commented Jun 4, 2021

l0b0 commented Jun 4, 2021

duckontheweb commented Jun 11, 2021

duckontheweb commented Jun 11, 2021

l0b0 commented Jun 19, 2021 • edited Loading

l0b0 commented Jun 19, 2021 • edited Loading

duckontheweb commented Jun 24, 2021

duckontheweb commented Jun 24, 2021

l0b0 commented Jun 24, 2021

l0b0 commented Jun 24, 2021

duckontheweb commented Jun 24, 2021

lossyrob commented Jun 24, 2021

l0b0 commented Jun 24, 2021 • edited Loading

duckontheweb commented Jul 8, 2021

l0b0 commented Jul 9, 2021

duckontheweb commented Jul 12, 2021

l0b0 commented Jun 19, 2021 •

edited

Loading

l0b0 commented Jun 19, 2021 •

edited

Loading

l0b0 commented Jun 24, 2021 •

edited

Loading