-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add codes to extract catchment geometry to polygons and ESMF regridding #418
base: master
Are you sure you want to change the base?
Conversation
…ectangles to polygons
…ion based on mask. Add regrid_polygon_mesh_1netcdf.py that write all forcing to a single netcdf file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done going through the whole thing, but here's the notes on the embedded catchment IDs I mentioned. Also consider some of the other suggestions that might have an impact on performance.
f.write("index,lon,lat\n") | ||
coord_list = all_coords[0] | ||
for s in coord_list: | ||
f.write('%d ' %k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is an internal-only file format, CSVs should probably be comma-separated and not space-separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I originally wrote the output as a text file. Will make the change.
#n = 14632 | ||
|
||
def get_catchment_geometry(n, cat_file, output_dir): | ||
cat_df_full = gpd.read_file(cat_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of this script, generally--that is, what is the purpose of converting the GeoJSON to CSV? In theory, perhaps this allows the next script to keep its memory footprint low, by not having the whole hydrofabric in memory? But this script loads the whole hydrofabric into memory 50x at the same time (via this line).
Big picture, why not read the hydrofabric once in the main script, pass individual geometry objects to the workers in that script, and skip a large amount of file I/O that happens with rewriting and reading in this format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to roll the functionality of this script into the regrid_polygon_mesh.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, at least eventually. But I may not understand the reason for the current design... was there a reason for translating them all into text/CSV files first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't for any particular design reason if I remember correctly. It was
mainly to make sure that the code extracted the geometry correctly from
geojson file for each catchment.
with open(filename) as f: | ||
next(f) | ||
for x in f: | ||
node_id.append(int(x.split(' ')[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider storing the result of x.split(' ')
here instead of calling it 7 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch!
# two listed catchments that contains zero angle and clockwise ordering that cause meshing error | ||
#FIXME This can be removed for hydrofabric without error or can be used as a validation check | ||
#for a new hydrofabric | ||
if cat_id == 'cat-39990' or cat_id == 'cat-39965': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for specific IDs is not something that should make it into release code. Is the error that occurs something that can be caught as an exception? Recommend perhaps adding a parameter to this function like cleanup_geometry=False
, put this block under if cleanup_geometry:
and the caller can catch an exception and try again with this parameter set to True
...if it fails again with that set, then fail completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a minimum, this code as is will reverse the directions of these two catchments once we get a "fixed" hydrofabric where the vertices aren't in the wrong order...so that will probably blow up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try something as suggested. Maybe we will keep this code in there just in case if similar problems occur for other hydrologic domain?
node_id.append(int(x.split(' ')[0])) | ||
lons.append(float(x.split(' ')[1])) | ||
lats.append(float(x.split(' ')[2])) | ||
lons_lats.append(float(x.split(' ')[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably do not generate this array here, especially given that you have to modify it in the cleanup routine. Instead, interleave lats
and lons
later using a method like this: https://stackoverflow.com/a/5347492 (since this has to eventually be a Numpy array anyway...there are other methods if it were going to stay a List).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are talking about lons_lats. Yeah, this can be done using interleaving somewhere after the code block correcting the vertices ordering error. lons_lats can be removed from that block of code.
|
||
#calculate local grid | ||
lons_min_grid = (lons_min - lons_first)/lons_delta | ||
#to fully encompass the polygon, we need to do the round down operation at lower boundary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a trick here, Python allows integer division with //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
|
||
ds = nc4.Dataset(aorcfile) | ||
|
||
lons_first = ds['longitude'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if the nc4 library is caching these four values, but it is possible that doing this once and storing lons_first
, lats_first
lons_delta
and lats_delta
in globals (since they never change) might provide a meaningful speed boost, if it's actually going to read the four values from the file every time. Worth trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, given that latitude
and longitude
are read n times below in read_sub_netcdf
, it probably makes sense to store the whole of these two arrays in a global, accessing the copy instead of going to the file repeatedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only complication is that we have to save them as dictionaries as they are cat-id dependent. Probably worth trying in some way.
var_value_list[i].append(var_value) | ||
i += 1 | ||
|
||
lons_first = ds['longitude'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See earlier comment on storing this value in a global instead of retrieving it n times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
setattr(lat_out, 'units', 'degrees_north') | ||
setattr(lon_out, 'units', 'degrees_east') | ||
#setattr(landmask_out, 'description', '1=in polygon 0=outside polygon') | ||
setattr(APCP_surface_out, 'units', 'kg/m^2') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The units here should be copied from the source file rather than hard-coded. This should probably be done once at the beginning of the script and stored in a global dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This is done by looking at the source file. Should be made automatic
It wasn't for any particular design reason if I remember correctly. It was
mainly to make sure that the code extracted the geometry correctly from
geojson file for each catchment.
I have responded to your other comments on the PR page.
…On Thu, Jun 23, 2022 at 11:55 AM Matt Williamson ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In utilities/esmpy/gen_hyfab_polygons.py
<#418 (comment)>:
> +import numpy as np
+from os import listdir
+from os.path import isfile, join
+import argparse
+import pandas as pd
+from multiprocessing import Process, Lock
+import multiprocessing
+import time
+
+# examples
+#input_path = "/local/ngen/data/huc01/huc_01/hydrofabric/spatial/catchment_data.geojson"
+#output_dir = "./huc01/polygons"
+#n = 14632
+
+def get_catchment_geometry(n, cat_file, output_dir):
+ cat_df_full = gpd.read_file(cat_file)
Probably, at least eventually. But I may not understand the reason for the
current design... was there a reason for translating them all into text/CSV
files first?
—
Reply to this email directly, view it on GitHub
<#418 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA4SRMTAP27X7UZM747BPDVQSJGTANCNFSM5W6VLOZA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
…esh.py and copy aorc netcdf attributes from original
Delete not needed file: utilities/esmpy/gen_hyfab_polygons.py
@stcui007 Can you re-submit this as a PR to https://github.com/NOAA-OWP/ngen-forcing please--I'd like to move these efforts over there while there in a higher state of flux/earlier state of development. Just put it in a directory with some appropriate name, things will probably move around in there as development and experimentation continue. Include your latest work (e.g. with "Conserve" method)... review in that repo will be much more cursory for a while. |
Yeah, sure.
…On Wed, Aug 24, 2022 at 1:51 PM Matt Williamson ***@***.***> wrote:
@stcui007 <https://github.com/stcui007> Can you re-submit this as a PR to
https://github.com/NOAA-OWP/ngen-forcing please--I'd like to move these
efforts over there while there in a higher state of flux/earlier state of
development. Just put it in a directory with some appropriate name, things
will probably move around in there as development and experimentation
continue. Include your latest work (e.g. with "Conserve" method)... review
in that repo will be much more cursory for a while.
—
Reply to this email directly, view it on GitHub
<#418 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACA4SRJRI4DNBQGQQAJ5Z6TV2ZVJXANCNFSM5W6VLOZA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Incorporated python code for extracting geometry coordinates from catchment_data.geojson into the main code, removed the relevant stand alone code.
Python code to generate mesh from polygon, perform ESMF regridding from rectangle to polygon, generate weights, calculate hydrologic properties for each catchment for arbitrary number of time steps and output to files in netcdf format.
Python code to write forcing data for all time steps to a single netcdf file.
Add capability of calculating average forcing based on the mask created using the catchment polygon boundary.
Additions
utilities/esmpy/gen_hyfab_polygons.py
utilities/esmpy/regrid_polygon_mesh.py
utilities/esmpy/regrid_polygon_mesh_1netcdf.py
Removals
Changes
Testing
Tested on Sugar Creek
Tested on HUC01
Screenshots
Notes
Todos
Checklist
Testing checklist (automated report can be put here)
Target Environment support