Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codes to extract catchment geometry to polygons and ESMF regridding #418

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

stcui007
Copy link
Contributor

@stcui007 stcui007 commented May 25, 2022

Incorporated python code for extracting geometry coordinates from catchment_data.geojson into the main code, removed the relevant stand alone code.
Python code to generate mesh from polygon, perform ESMF regridding from rectangle to polygon, generate weights, calculate hydrologic properties for each catchment for arbitrary number of time steps and output to files in netcdf format.
Python code to write forcing data for all time steps to a single netcdf file.
Add capability of calculating average forcing based on the mask created using the catchment polygon boundary.

Additions

utilities/esmpy/gen_hyfab_polygons.py
utilities/esmpy/regrid_polygon_mesh.py
utilities/esmpy/regrid_polygon_mesh_1netcdf.py

Removals

Changes

Testing

Tested on Sugar Creek
Tested on HUC01

Screenshots

Notes

Todos

Checklist

  • [x ] PR has an informative and human-readable title
  • [x ] Changes are limited to a single goal (no scope creep)
  • [ x] Code can be automatically merged (no conflicts)
  • [x ] Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • [x ] New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist (automated report can be put here)

Target Environment support

  • [x ] Linux

@stcui007 stcui007 requested review from donaldwj and mattw-nws May 25, 2022 21:48
stcui007 added 2 commits June 22, 2022 18:24
…ion based on mask.

Add regrid_polygon_mesh_1netcdf.py that write all forcing to a single netcdf file.
Copy link
Contributor

@mattw-nws mattw-nws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done going through the whole thing, but here's the notes on the embedded catchment IDs I mentioned. Also consider some of the other suggestions that might have an impact on performance.

f.write("index,lon,lat\n")
coord_list = all_coords[0]
for s in coord_list:
f.write('%d ' %k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is an internal-only file format, CSVs should probably be comma-separated and not space-separated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I originally wrote the output as a text file. Will make the change.

#n = 14632

def get_catchment_geometry(n, cat_file, output_dir):
cat_df_full = gpd.read_file(cat_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of this script, generally--that is, what is the purpose of converting the GeoJSON to CSV? In theory, perhaps this allows the next script to keep its memory footprint low, by not having the whole hydrofabric in memory? But this script loads the whole hydrofabric into memory 50x at the same time (via this line).

Big picture, why not read the hydrofabric once in the main script, pass individual geometry objects to the workers in that script, and skip a large amount of file I/O that happens with rewriting and reading in this format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to roll the functionality of this script into the regrid_polygon_mesh.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, at least eventually. But I may not understand the reason for the current design... was there a reason for translating them all into text/CSV files first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't for any particular design reason if I remember correctly. It was
mainly to make sure that the code extracted the geometry correctly from
geojson file for each catchment.

with open(filename) as f:
next(f)
for x in f:
node_id.append(int(x.split(' ')[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider storing the result of x.split(' ') here instead of calling it 7 times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch!

# two listed catchments that contains zero angle and clockwise ordering that cause meshing error
#FIXME This can be removed for hydrofabric without error or can be used as a validation check
#for a new hydrofabric
if cat_id == 'cat-39990' or cat_id == 'cat-39965':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for specific IDs is not something that should make it into release code. Is the error that occurs something that can be caught as an exception? Recommend perhaps adding a parameter to this function like cleanup_geometry=False, put this block under if cleanup_geometry: and the caller can catch an exception and try again with this parameter set to True...if it fails again with that set, then fail completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum, this code as is will reverse the directions of these two catchments once we get a "fixed" hydrofabric where the vertices aren't in the wrong order...so that will probably blow up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try something as suggested. Maybe we will keep this code in there just in case if similar problems occur for other hydrologic domain?

node_id.append(int(x.split(' ')[0]))
lons.append(float(x.split(' ')[1]))
lats.append(float(x.split(' ')[2]))
lons_lats.append(float(x.split(' ')[1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably do not generate this array here, especially given that you have to modify it in the cleanup routine. Instead, interleave lats and lons later using a method like this: https://stackoverflow.com/a/5347492 (since this has to eventually be a Numpy array anyway...there are other methods if it were going to stay a List).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are talking about lons_lats. Yeah, this can be done using interleaving somewhere after the code block correcting the vertices ordering error. lons_lats can be removed from that block of code.

utilities/esmpy/regrid_polygon_mesh.py Show resolved Hide resolved

#calculate local grid
lons_min_grid = (lons_min - lons_first)/lons_delta
#to fully encompass the polygon, we need to do the round down operation at lower boundary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a trick here, Python allows integer division with //

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.


ds = nc4.Dataset(aorcfile)

lons_first = ds['longitude'][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the nc4 library is caching these four values, but it is possible that doing this once and storing lons_first, lats_first lons_delta and lats_delta in globals (since they never change) might provide a meaningful speed boost, if it's actually going to read the four values from the file every time. Worth trying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, given that latitude and longitude are read n times below in read_sub_netcdf, it probably makes sense to store the whole of these two arrays in a global, accessing the copy instead of going to the file repeatedly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only complication is that we have to save them as dictionaries as they are cat-id dependent. Probably worth trying in some way.

var_value_list[i].append(var_value)
i += 1

lons_first = ds['longitude'][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment on storing this value in a global instead of retrieving it n times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

setattr(lat_out, 'units', 'degrees_north')
setattr(lon_out, 'units', 'degrees_east')
#setattr(landmask_out, 'description', '1=in polygon 0=outside polygon')
setattr(APCP_surface_out, 'units', 'kg/m^2')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The units here should be copied from the source file rather than hard-coded. This should probably be done once at the beginning of the script and stored in a global dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This is done by looking at the source file. Should be made automatic

@stcui007
Copy link
Contributor Author

stcui007 commented Jun 23, 2022 via email

@mattw-nws
Copy link
Contributor

@stcui007 Can you re-submit this as a PR to https://github.com/NOAA-OWP/ngen-forcing please--I'd like to move these efforts over there while there in a higher state of flux/earlier state of development. Just put it in a directory with some appropriate name, things will probably move around in there as development and experimentation continue. Include your latest work (e.g. with "Conserve" method)... review in that repo will be much more cursory for a while.

@stcui007
Copy link
Contributor Author

stcui007 commented Aug 24, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants