Skip to content

Commit

Permalink
hello world
Browse files Browse the repository at this point in the history
  • Loading branch information
gjoseph92 committed Mar 11, 2021
0 parents commit a99ced2
Show file tree
Hide file tree
Showing 37 changed files with 6,879 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[flake8]
max-line-length = 120
exclude =
.pyi
typings
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
__pycache__
*.pyc

dask-worker-space
.ipynb_checkpoints
.vscode
/dist

*.DS_Store
13 changes: 13 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: 2

sphinx:
configuration: docs/conf.py

# https://github.com/readthedocs/readthedocs.org/issues/4912#issuecomment-664002569
python:
version: 3.8
install:
- method: pip
path: .
extra_requirements:
- docs
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) [year] [fullname]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
95 changes: 95 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# StackSTAC

[![Documentation Status](https://readthedocs.org/projects/stackstac/badge/?version=latest)](https://stackstac.readthedocs.io/en/latest/?badge=latest)

Turn a list of [STAC](http://stacspec.org) items into a 4D [xarray](http://xarray.pydata.org/en/stable/) DataArray (dims: `time, band, y, x`), including reprojection to a common grid. The array is a lazy [Dask array](https://docs.dask.org/en/latest/array.html), so loading and processing the data in parallel—locally or [on a cluster](https://coiled.io/)—is just a `compute()` call away.

For more information and examples, please [see the documentation](https://stackstac.readthedocs.io).

```python
import stackstac
import satsearch

stac_items = satsearch.Search(
url="https://earth-search.aws.element84.com/v0",
intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
collections=["sentinel-s2-l2a-cogs"],
datetime="2020-04-01/2020-05-01"
).items()

stack = stackstac.stack(stac_items)
print(stack)
```
```
<xarray.DataArray 'stackstac-f350f6bfc3213d7eee2e6cb159246d88' (time: 13, band: 17, y: 10980, x: 10980)>
dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/23)
* time (time) datetime64[ns] 2020-04-01T18:04:04 ......
id (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S...
* band (band) <U8 'overview' 'visual' ... 'WVP' 'SCL'
* x (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
* y (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
eo:cloud_cover (time) float64 29.24 1.16 27.26 ... 87.33 5.41
... ...
data_coverage (time) object 33.85 100 33.9 ... 32.84 100 34.29
platform (time) <U11 'sentinel-2b' ... 'sentinel-2b'
sentinel:sequence <U1 '0'
proj:epsg int64 32613
sentinel:data_coverage (time) float64 33.85 100.0 33.9 ... 100.0 34.29
title (band) object None ... 'Scene Classification ...
Attributes:
spec: RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
crs: epsg:32613
transform: | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
resolution: 10.0
```

Once in xarray form, many operations become easy. For example, we can compute a low-cloud weekly mean-NDVI timeseries:

```python
lowcloud = stack[stack["eo:cloud_cover"] < 40]
nir, red = lowcloud.sel(band="B08"), lowcloud.sel(band="B04")
ndvi = (nir - red) / (nir + red)
weekly_ndvi = ndvi.resample(time="1w").mean(dim=("time", "x", "y")).rename("NDVI")
# Call `weekly_ndvi.compute()` to process ~25GiB of raster data in parallel. Might want a dask cluster for that!
```

## Installation

```
pip install stackstac
```

## Things `stackstac` does for you:

* Figure out the geospatial parameters from the STAC metadata (if possible): a coordinate reference system, resolution, and bounding box.
* Transfer the STAC metadata into [xarray coordinates](http://xarray.pydata.org/en/stable/data-structures.html#coordinates) for easy indexing, filtering, and provenance of metadata.
* Efficiently generate a Dask graph for loading the data in parallel.
* Mediate between Dask's parallelism and GDAL's aversion to it, allowing for fast, multi-threaded reads when possible, and at least preventing segfaults when not.
* Mask nodata and rescale by dataset-level scales/offsets.

## Limitations:

* **Raster data only!** We are currently ignoring other types of assets you might find in a STAC (XML/JSON metadata, point clouds, video, etc.).
* **Single-band raster data only!** Each band has to be a separate STAC asset—a separate `red`, `green`, and `blue` asset on each Item is great, but a single `RGB` asset containing a 3-band GeoTIFF is not supported yet.
* [COG](https://www.cogeo.org)s work best. "Normal" GeoTIFFs that aren't internally tiled, or don't have overviews, will see much worse performance. Sidecar files (like `.msk` files) are ignored for performace. JPEG2000 will probably work, but probably be slow unless you buy kakadu. [Formats make a big difference](https://medium.com/@_VincentS_/do-you-really-want-people-using-your-data-ec94cd94dc3f).
* BYOBlocksize. STAC doesn't offer any metadata about the internal tiling scheme of the data. Knowing it can make IO more efficient, but actually reading the data to figure it out is slow. So it's on you to set this parameter. (But if you don't, things should be fine for any reasonable COG.)
* Doesn't make geospatial data any easier to work with in xarray. Common operations (picking bands, clipping to bounds, etc.) are tedious to type out. Real geospatial operations (shapestats on a GeoDataFrame, reprojection, etc.) aren't supported at all. [rioxarray](https://corteva.github.io/rioxarray/stable/readme.html) might help with some of these, but it has limited support for Dask, so be careful you don't kick off a huge computation accidentally.
* I haven't even written tests yet! Don't use this in production. Or do, I guess. Up to you.

## Roadmap:

Short-term:

- Write tests and add CI (including typechecking)
- Support multi-band assets
- Easier access to `s3://`-style URIs (right now, you'll need to pass in `gdal_env=stackstac.DEFAULT_GDAL_ENV.updated(always=dict(session=rio.session.AWSSession(...)))`)
- Utility to guess blocksize (open a few assets)
- Support [item assets](https://github.com/radiantearth/stac-spec/tree/master/extensions/item-assets) to provide more useful metadata with collections that use it (like S2 on AWS)
- Rewrite dask graph generation once the [Blockwise IO API](https://github.com/dask/dask/pull/7281) settles

Long term (if anyone uses this thing):
- Support other readers ([aiocogeo](https://github.com/geospatial-jeff/aiocogeo)?) that may perform better than GDAL for specific formats
- Interactive mapping with [xarray_leaflet](https://github.com/davidbrochart/xarray_leaflet), made performant with some Dask graph-rewriting tricks to do the initial IO at coarser resolution for lower zoom levels (otherwize zooming out could process terabytes of data)
- Improve ergonomics of xarray for raster data (in collaboration with [rioxarray](https://corteva.github.io/rioxarray/stable/readme.html))
- Implement core geospatial routines (warp, vectorize, vector stats, [GeoPandas](https://geopandas.org)/[spatialpandas](https://github.com/holoviz/spatialpandas) interop) in Dask
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
_build/*
api/internal/*.rst
23 changes: 23 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

livehtml:
sphinx-autobuild "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
17 changes: 17 additions & 0 deletions docs/api/internal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Internal API reference
----------------------

You shouldn't need to look here unless you're curious, but while this library is still so not-production-ready, you might need to understand some of these internal types for now.

.. currentmodule:: stackstac

.. autosummary::
:toctree: internal

rio_env.LayeredEnv
raster_spec.RasterSpec
reader_protocol.Reader
rio_reader.AutoParallelRioReader
rio_reader.ThreadLocalRioDataset
rio_reader.SingleThreadedRioDataset
reader_protocol.FakeReader
6 changes: 6 additions & 0 deletions docs/api/main.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
API reference
-------------

There's only one function:

.. autofunction:: stackstac.stack
347 changes: 347 additions & 0 deletions docs/basic.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit a99ced2

Please sign in to comment.