Skip to content

Commit

Permalink
Add modis sampling to docs (#335)
Browse files Browse the repository at this point in the history
  • Loading branch information
yellowcap authored Dec 6, 2024
1 parent 7fe5e68 commit 8001245
Showing 1 changed file with 62 additions and 0 deletions.
62 changes: 62 additions & 0 deletions docs/release-notes/data_sampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,67 @@ and a maximum of 2000 scenes for each catalog that was included.
We selected the latest imagery for each of the available regions
of new zealand. The list of catalogs is in the linz processor file.

### MODIS sampling strategy

For MODIS we used the [Surface Reflectance 8-Day (500m)](https://planetarycomputer.microsoft.com/dataset/modis-09A1-061)
product. The data is distributed in SIN grid tiles. We included all SIN grid
tiles that do not have any nodata inside. The selected SIN grid tiles are then
transform to EPSG:3857 for all tiles. This results in some variation between the
nominal resolution, although the original resolution from the SIN projection is
500 meters. For input to the model, we assumed the 500m resolution as a fixed
resolution size for all tiles.

Algorithm to determine which tiles do not have nodata is shown in the code block
below. This resulted in 233 SIN grid tiles to be selected. For each of these
we sampled the first STAC search result for each month in each year from 2018
until 2023. This therefore resulted in 72 (`6 years * 12 months`) separate scenes
for each of the 233 SIN grid tiles.

Script for selection of SIN grid tiles included in the sampling:

```python
from multiprocessing import Pool
import rasterio
import planetary_computer as pc
import pystac_client
import numpy as np

SIN_GRID_TILES = []
for i in SIN_VERTICAL_RANGE:
for j in SIN_HORIZONTAL_RANGE:
SIN_GRID_TILES.append((i, j))

def evaluate_nodata(i, j):
catalog = pystac_client.Client.open(STAC_API, modifier=pc.sign_inplace)
items = catalog.search(
collections=[COLLECTION],
query={
"modis:vertical-tile": {
"eq": i,
},
"modis:horizontal-tile": {
"eq": j,
},
},
max_items=1,
)
item = list(items.item_collection())[0]

with rasterio.open(item.assets["sur_refl_b01"].href) as src:
data = src.read()

nodata = np.sum(data == -28672)

if nodata == 0:
print(i, j)
return i, j

if __name__ == '__main__':
with Pool(16) as p:
indexes = p.starmap(evaluate_nodata, SIN_GRID_TILES)
print("done")
print(indexes)
```

## Data preparation

Expand All @@ -136,6 +197,7 @@ Using stacchip, we created a dataset with a size of 33.8 TB of imagery, with abo
| Landsat-c2l1 | 5827333 |
| Landsat-c2l2-sr | 5790651 |
| Sentinel-1-rtc | 16133394 |
| MODIS | 1350864 |

# Older versions

Expand Down

0 comments on commit 8001245

Please sign in to comment.