Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: StacIO that retries network requests #958

Closed
gadomski opened this issue Jan 20, 2023 · 0 comments · Fixed by #986
Closed

New feature: StacIO that retries network requests #958

gadomski opened this issue Jan 20, 2023 · 0 comments · Fixed by #986
Assignees
Milestone

Comments

@gadomski
Copy link
Member

Summary

Add a RetryStacIO (probably inheriting from DefaultStacIO) that retries network requests in a configurable way. It'll probably be modeled on urllib3, though I am not suggesting we add urllib3 as a dependency. E.g.

from pystac import Item
from pystac.stac_io import RetryStacIO
url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/aster-l1t/items/AST_L1T_00310012006175412_20150516104359"
item = Item.from_file(url, RetryStacIO())  #  This will retried with default settings
item = Item.from_file(url, RetryStacIO(total=3, backoff_factor=2))   # Retry can be configured

# If you want to enable retries for all PySTAC operations
from pystac import StacIO
StacIO.set_default(RetryStacIO)

Motivation

When doing a large number of operations with PySTAC, sometimes in parallel, it's possible to overwhelm servers or otherwise get transient errors. My specific example involved Item.to_dict -- I forgot to specify transform_hrefs=False (clapback to #546 (comment)), and did this ~1 million times:

d = item.to_dict(include_self_href=False)

Under the hood, this does a network request to resolve the root, and those requests would sometimes error out. I solved my specific problem by disabling href transforms:

d = item.to_dict(include_self_href=False, transform_hrefs=False)

But a retry-enabled PySTAC would be useful in any large-scale/batch processing context.

Alternatives

We could add urllib3 as a dependency and use it's Retry explicitly. I don't think that's a terrible thing -- urllib3 is widely used and popular. requests is built on it, so adding urllib3 could represent a gradual slide towards just biting the bullet and adding requests too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant