-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use pixetl for geojson creation in resample script #610
Changes from 19 commits
5236433
44bed61
821dc28
6ee4704
d5a28d9
f93b536
b4ddb5b
a4f5124
a487993
d4126c3
961d266
96dca40
0ffabfc
40ed0c1
36631ca
f662f0e
90b12c4
b6ffcf4
0f920e3
6862049
b70d59d
ab99a97
bbd833a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
import json | ||
from concurrent.futures import ProcessPoolExecutor, as_completed | ||
from typing import Any, Dict, List, Tuple | ||
|
||
from geojson import Feature, FeatureCollection | ||
from pyproj import CRS, Transformer | ||
from shapely.geometry import Polygon | ||
from shapely.ops import unary_union | ||
|
||
from errors import GDALError | ||
from gdal_utils import run_gdal_subcommand | ||
|
||
|
||
def to_4326(crs: CRS, x: float, y: float) -> Tuple[float, float]: | ||
transformer = Transformer.from_crs( | ||
crs, CRS.from_epsg(4326), always_xy=True | ||
) | ||
return transformer.transform(x, y) | ||
|
||
|
||
def extract_metadata_from_gdalinfo(gdalinfo_json: Dict[str, Any]) -> Dict[str, Any]: | ||
"""Extract necessary metadata from the gdalinfo JSON output.""" | ||
corner_coordinates = gdalinfo_json["cornerCoordinates"] | ||
|
||
crs: CRS = CRS.from_string(gdalinfo_json["coordinateSystem"]["wkt"]) | ||
metadata = { | ||
# NOTE: pixetl seems to always write features in tiles.geojson in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Specifically, pixetl is using the metadata as computed by rasterio.open(). So, are you saying that gdalinfo is returning cornerCoordinates in the original projection, but rasterio.open() is giving the extent always in WGS 84. If so, maybe mention that you are emulating the behavior from rasterio.open() here. I do see that the tiles.geojson at There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I admit I'm a little bit fuzzy on the terminology, but I think technically EPSG:3857 and EPSG:4326 are both WGS 84? In any case, what I really mean is that the coordinates in the feature are always in degrees, even for WM tiles. I *think* they should be in meters for WM tiles. So that's what I'm trying to duplicate (even though it feels wrong).
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll re-write the comment to clarify that it's the units, not the actual projection. |
||
# epsg:4326 coordinates (even when the tiles themselves are | ||
# epsg:3857). Reproduce that behavior for compatibility. If that | ||
# ever changes, remove the call to to_4326 here. | ||
"extent": [ | ||
*to_4326(crs, *corner_coordinates["lowerLeft"]), | ||
*to_4326(crs, *corner_coordinates["upperRight"]), | ||
], | ||
"name": gdalinfo_json["description"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like the tiles.geojson that we create with pixetl have a whole bunch more properties in them (looks like most that are printed by gdalinfo), but I'm assuming you've verified that we only need the 'extent' and 'name' properties - since those are the only ones you are generating beside the Polygon coordinates? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh. I actually had those other attributes in there, but took them out because (I swear) I looked at a tiles.geojson that didn't have them and I was trying to reproduce it. I must have been mistaken, I guess? Maybe I was accidentally looking at a custom geojson I had lying around? I don't know, but you're right, I see all that stuff in recent tiles.geojsons. I'll put it back. |
||
} | ||
|
||
return metadata | ||
|
||
|
||
def process_file(file_path: str) -> Dict[str, Any]: | ||
"""Run gdalinfo and extract metadata for a single file.""" | ||
print(f"Running gdalinfo on {file_path}") | ||
try: | ||
stdout,stderr = run_gdal_subcommand( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any reason you are running 'gdalinfo` here rather than doing the same thing that pixetl does, which is getting the info from a rasterio.open() of the file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My mind went immediately to gdalinfo because (though it may do it differently in other places) pixetl gets metadata with gdalinfo here: https://github.com/wri/gfw_pixetl/blob/master/gfw_pixetl/utils/gdal.py#L170 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, thanks for that pointer to the pixetl code that is getting the meta-data. I don't see any reprojection/resizing (to EPSG:4326 or meters) in that code you are pointing to, so does that mean you don't really need the to_4326 stuff? See https://github.com/wri/gfw_pixetl/blob/cfb7c2fa1c0b6f366982e38839baa1cabc425938/gfw_pixetl/utils/gdal.py#L192 (But maybe I am missing a conversion somewhere else.) |
||
["gdalinfo", "-json", file_path], | ||
) | ||
except GDALError as e: | ||
raise RuntimeError(f"Failed to run gdalinfo on {file_path}: {e}") | ||
|
||
gdalinfo_json: Dict = json.loads(stdout) | ||
return extract_metadata_from_gdalinfo(gdalinfo_json) | ||
|
||
|
||
def generate_geojsons( | ||
geotiffs: List[str], | ||
max_workers: int = None | ||
) -> Tuple[FeatureCollection, FeatureCollection]: | ||
"""Generate tiles.geojson and extent.geojson files.""" | ||
features = [] | ||
polygons = [] | ||
|
||
with ProcessPoolExecutor(max_workers=max_workers) as executor: | ||
future_to_file = {executor.submit(process_file, file): file for file in geotiffs} | ||
for future in as_completed(future_to_file): | ||
file = future_to_file[future] | ||
try: | ||
metadata = future.result() | ||
extent = metadata["extent"] | ||
# Create a Polygon from the extent | ||
polygon_coords = [ | ||
[extent[0], extent[1]], | ||
[extent[0], extent[3]], | ||
[extent[2], extent[3]], | ||
[extent[2], extent[1]], | ||
[extent[0], extent[1]], | ||
] | ||
polygon = Polygon(polygon_coords) | ||
|
||
# Add to GeoJSON features | ||
feature = Feature(geometry=polygon.__geo_interface__, properties=metadata) | ||
features.append(feature) | ||
|
||
# Collect for union | ||
polygons.append(polygon) | ||
except Exception as e: | ||
raise RuntimeError(f"Error processing file {file}: {e}") | ||
|
||
tiles_fc = FeatureCollection(features) | ||
|
||
union_geometry = unary_union(polygons) | ||
extent_fc = FeatureCollection([ | ||
Feature(geometry=union_geometry.__geo_interface__, properties={}) | ||
]) | ||
|
||
return tiles_fc, extent_fc |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that in existing tiles.geojson, the json is just written out with only spaces and no newlines (i.e. not prettifying with indents, etc.). Do we also want to do that (not indent), or do you think it is worth it for debugging to leave it in pretty format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point. I like being able to look at it for debugging, but I'll remove the indent for now in the interests of changing as little as possible.