Add codes to extract catchment geometry to polygons and ESMF regridding #418

stcui007 · 2022-05-25T21:48:15Z

Incorporated python code for extracting geometry coordinates from catchment_data.geojson into the main code, removed the relevant stand alone code.
Python code to generate mesh from polygon, perform ESMF regridding from rectangle to polygon, generate weights, calculate hydrologic properties for each catchment for arbitrary number of time steps and output to files in netcdf format.
Python code to write forcing data for all time steps to a single netcdf file.
Add capability of calculating average forcing based on the mask created using the catchment polygon boundary.

Additions

utilities/esmpy/gen_hyfab_polygons.py
utilities/esmpy/regrid_polygon_mesh.py
utilities/esmpy/regrid_polygon_mesh_1netcdf.py

Removals

Changes

Testing

Tested on Sugar Creek
Tested on HUC01

Screenshots

Notes

Todos

Checklist

[x ] PR has an informative and human-readable title
[x ] Changes are limited to a single goal (no scope creep)
[ x] Code can be automatically merged (no conflicts)
[x ] Code follows project standards (link if applicable)
Passes all existing automated tests
Any change in functionality is tested
[x ] New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future todos are captured in comments
Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
Reviewers requested with the Reviewers tool ➡️

Testing checklist (automated report can be put here)

Target Environment support

[x ] Linux

…ectangles to polygons

…ion based on mask. Add regrid_polygon_mesh_1netcdf.py that write all forcing to a single netcdf file.

mattw-nws

Not done going through the whole thing, but here's the notes on the embedded catchment IDs I mentioned. Also consider some of the other suggestions that might have an impact on performance.

mattw-nws · 2022-06-21T19:06:44Z

utilities/esmpy/gen_hyfab_polygons.py

+        f.write("index,lon,lat\n")
+        coord_list = all_coords[0]
+        for s in coord_list:
+            f.write('%d ' %k)


While this is an internal-only file format, CSVs should probably be comma-separated and not space-separated.

Agree. I originally wrote the output as a text file. Will make the change.

mattw-nws · 2022-06-21T19:11:31Z

utilities/esmpy/gen_hyfab_polygons.py

+#n = 14632
+
+def get_catchment_geometry(n, cat_file, output_dir):
+    cat_df_full = gpd.read_file(cat_file)


I don't understand the purpose of this script, generally--that is, what is the purpose of converting the GeoJSON to CSV? In theory, perhaps this allows the next script to keep its memory footprint low, by not having the whole hydrofabric in memory? But this script loads the whole hydrofabric into memory 50x at the same time (via this line).

Big picture, why not read the hydrofabric once in the main script, pass individual geometry objects to the workers in that script, and skip a large amount of file I/O that happens with rewriting and reading in this format?

Are you suggesting to roll the functionality of this script into the regrid_polygon_mesh.py?

Probably, at least eventually. But I may not understand the reason for the current design... was there a reason for translating them all into text/CSV files first?

It wasn't for any particular design reason if I remember correctly. It was
mainly to make sure that the code extracted the geometry correctly from
geojson file for each catchment.

mattw-nws · 2022-06-21T19:24:50Z