Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix normalizing hrefs for RELATIVE_UNPUBLISHED catalogs #300

Closed
wants to merge 2 commits into from
Closed

Fix normalizing hrefs for RELATIVE_UNPUBLISHED catalogs #300

wants to merge 2 commits into from

Conversation

gadomski
Copy link
Member

@gadomski gadomski commented Apr 5, 2021

Related Issue(s): None

Description: Per https://pystac.readthedocs.io/en/latest/api.html#pystac.CatalogType.RELATIVE_PUBLISHED,
RELATIVE_PUBLISHED catalogs should only have one absolute link, the
self link on the root catalog. The commit here is a failing test showing that items have absolute hierarchical paths after normalizing hrefs on a RELATIVE_UNPUBLISHED catalog. Before fixing in this PR, I want to make sure this is unexpected behavior and it's not just me mis-understanding what's going on.

PR Checklist:

  • Code is formatted (run scripts/format)
  • Tests pass (run scripts/test)
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

Per
https://pystac.readthedocs.io/en/latest/api.html#pystac.CatalogType.RELATIVE_PUBLISHED,
`RELATIVE_PUBLISHED` catalogs should only have on absolute link, the
`self` link on the root catalog. This failing test shows that items
have absolute links.
@matthewhanson
Copy link
Member

Here's the test case with annotations

def test_normalize_hrefs_dont_make_items_absolute(self):
    # this catalog is SELF_CONTAINED, as it contains relative links and has no root self
    catalog = TestCases.test_case_1()

    # setting catalog_type doesn't do anything immediately, but will affect return value of any Link.get_href() call
    catalog.catalog_type = CatalogType.RELATIVE_PUBLISHED

    # this will resolve the entire catalog and set self href on all objects, but does not iterate through links
    catalog.normalize_hrefs('./relativepath')

    abspath = os.path.abspath('./relativepath')
    self.assertTrue(catalog.get_self_href().startswith(abspath))
    for (_, _, items) in catalog.walk():
        for item in items:
            for link in item.links:
                # the link.get_href() call will return relative or absolute based on catalog_type of the root
                self.assertFalse(os.path.isabs(link.get_href()), link.get_href())

Now, on the last check, all links are resolved, so the Link.get_href() will use the target get_self_href value (which is always absolute) then convert it to relative based on root.catalog_type
https://github.com/stac-utils/pystac/blob/main/pystac/link.py#L80

Is it failing the first link it hits?
What happens if you save the catalog, do the saved items have relative links?

When working with relative published catalogs, sub-catalogs and
sub-collections should (IMO) inherit the "relative published" from their
root. This adds a test to show that they don't, right now. Also improves
the message for the normalizing hrefs test.
@gadomski
Copy link
Member Author

gadomski commented Apr 6, 2021

Is it failing the first link it hits?

Nope, the self link. Here's the reporting from the (improved) test output:

======================================================================
FAIL: test_normalize_hrefs_in_relative_published (test_catalog.CatalogTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/pystac/pystac/tests/test_catalog.py", line 299, in test_normalize_hrefs_in_relative_published
    f"link #{i} (rel='{link.rel}'): {link.get_href()}")
AssertionError: True is not false : link #2 (rel='self'): /home/runner/work/pystac/pystac/relativepath/country-1/area-1-1/area-1-1-imagery/area-1-1-imagery.json

What happens if you save the catalog, do the saved items have relative links?

I added a test to check this, and at first it seemed fine. But when I added an intermediate collection w/o an explicit catalog type, then all links under that collection were not relative. It seems like you should be able to set catalog type at the root without having to specify it on all sub-catalogs and sub-collections?

@gadomski
Copy link
Member Author

gadomski commented Apr 6, 2021

For test_normalize_hrefs_in_relative_published, looks like the self link isn't normalized because it's not in rel_links here. From my understanding, items shouldn't have self links in relative published catalogs, but where should that self link be removed? normalize_hrefs?

For test_inherit_catalog_type, I see that my order of operations is backwards here -- the add_item must come after add_child to ensure that the item's root is set to the root catalog and not the collection. If I cooked up a utility method to "fix" the tree to ensure that there was only one root reference for all items (and similar fixes) would that be useful and if it would be, where would you like that to go? Catalog.normalize_tree? Or does that exist and I'm just missing it?

@gadomski
Copy link
Member Author

gadomski commented May 4, 2021

Closing this as #309 is changing enough that it's not worth updating this (pretty minor) branch.

@gadomski gadomski closed this May 4, 2021
@gadomski gadomski deleted the fix/relative-hrefs branch May 4, 2021 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants