Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAC API dataset #403

Open
calebrob6 opened this issue Feb 16, 2022 · 4 comments · May be fixed by #412
Open

STAC API dataset #403

calebrob6 opened this issue Feb 16, 2022 · 4 comments · May be fixed by #412
Assignees
Labels
datasets Geospatial or benchmark datasets

Comments

@calebrob6
Copy link
Member

calebrob6 commented Feb 16, 2022

SpatioTemporal Asset Catalogs (STACs) are a way to organize geospatial datasets. STAC APIs let users query huge STAC Catalogs by date, time, and other metadata.

For example, the Microsoft Planetary Computer runs a STAC API that lets users search over catalogs containing all of Sentinel 2 imagery, all Landsat 8, etc. The following code uses the pystac_client library to query the Planetary Computer STAC API and returns metadata, and links to GeoTIFFs, for relevant Sentinel 2 scenes:

from pystac_client import Client

area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-148.56536865234375, 60.80072385643073],
            [-147.44338989257812, 60.80072385643073],
            [-147.44338989257812, 61.18363894915102],
            [-148.56536865234375, 61.18363894915102],
            [-148.56536865234375, 60.80072385643073],
        ]
    ],
}
time_of_interest = "2019-06-01/2019-08-01"

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

search = catalog.search(
    collections=["sentinel-2-l2a"],
    intersects=area_of_interest,
    datetime=time_of_interest,
    query={"eo:cloud_cover": {"lt": 10}},
)

items = list(search.get_items())
print(f"Returned {len(items)} Items")

We'd like to build a STACAPIDataset object that essentially wraps catalog.search(...), creates a RasterDataset from the returned items, and otherwise behaves as a normal PyTorch dataset (signing assets as needed, etc.). A signature like STACAPIDataset(root="data/", api_endpoint, max_cache_size=None, **query_parameters_to_pystac_client) would be a good starting point here.

As a detailed note, it may be a good idea to cache accessed data in a local directory.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Feb 16, 2022
@nilsleh
Copy link
Collaborator

nilsleh commented Feb 16, 2022

I would be really interested in taking on this task!

@calebrob6
Copy link
Member Author

All yours :) (I had you in mind writing this actually, it is a bit more interesting than the other dataset stuff!) -- feel free to message me if you want to discuss details

@adamjstewart -- this would involve taking on some dependencies (pystac_client, planetary-computer, maybe stackstac)

@adamjstewart
Copy link
Collaborator

We can make those deps optional if we need to.

@nilsleh nilsleh linked a pull request Feb 17, 2022 that will close this issue
@metazool
Copy link

Nice potential feature, is there still intention to work on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants