Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached FIM - Part 1b - Removal of Redshift & Other Optimizations / Fixes #620

Merged
merged 33 commits into from
Jan 25, 2024

Conversation

TylerSchrag-NOAA
Copy link
Contributor

@TylerSchrag-NOAA TylerSchrag-NOAA commented Jan 16, 2024

This PR marks another iterative step towards stabilizing the new cached FIM implementation. It includes several minor optimizations and fixes, with a couple major pivots to the infrastructure / workflow:

Major Changes:

  • Switch from HydroID, HUC8 & Branch indexing of HAND data to a new unique 'HAND_ID' integer. - We may need to update this moving forward to match a coordinated effort on the FIM Dev team, but switching to a single unique integer ID speeds up the database operations significantly, as well as simplifying the scripting and join logic.

  • Removal of Redshift Data Warehouse, Implementing in RDS now instead - I initially chose Redshift for this feature after testing a prototype in early 2023 that was too large for RDS to run. That prototype was based on the initial [bad] plan that we would write a process that would preprocess all ~440 million HAND hydrotable geometries in advance (all steps of the synthetic rating curves), and would need the infrastructure to query that entire dataset efficiently enough for VPP pipelines. After testing our new lazy loading approach for several weeks (a much better idea, suggested by Corey), it has become apparent that most pipelines utilize a very small portion of the full hydrotables, and with the HAND_ID optimization, RDS is likely up to the task of handling these cache workflows just fine.

While I wish I would have thought of these considerations earlier on and saved the work of trying out Redshift fully to begin with, I think this is ultimately a great pivot that dramatically simplifies and stabilizes this Cached FIM enhancement. The previous PRs of this series can serve as a reference for the team should they decide to utilize Redshift in the future... but it is worth noting that I still hadn't completely sorted out some issues that Redshift was having with some of the more complex FIM geometries that weren't included in my initial testing last year - that may end up being a full on deal breaker with Redshift (Aurora may be worth a try first if/when scaling the RDS instance isn't a good option any longer).

Other Noteworthy Edits:

  • Public FIM Clipping Optimization - I changed the code to use a lookup on the derived.channels table to determine which reaches are in the public subset, instead of doing a spatial join with every pipeline (this was the thing causing MRF to fail during heavy mid-January weather). This is much more performant... but doesn't clip the FIM extent shapes to the exact border of the public domain, which is an enhancement that Corey added after specifically being asked by someone. I'll try to see if there is a way to do that in a more optimal way, but this will at least keep those pipelines from failing under load.

Deployment / DB Dump Considerations:

  • I've added several version 2.1.5 DB dump files to the deployed folder in hydrovis-ti-deployment-us-east-1 (we could move them outside of this folder, if we want to test those during deployment, of course).
  • These will need to be added to the UAT S3 folders before deployment there.
  • I purposely avoided any changes to the ArcGIS mapx files... so hopefully the SD creation script on the EC2 won't cause issues for this. Hopefully.

Update:
I've added a bunch of misc. fixes and clean-up to this branch while it's been waiting deployment, and think I have resolved all issues with the regular operational pipelines (at least that I know of). I still need to fully implement AEP and CatFIM pipelines, but these should not hold up deployment to UAT, since they are only run as one-off products anyways (will need to be updated next for NWM 3.0 Recurrence Flow Updates and/or next FIM Version update). I'm planning to wrap that up in early February.

@TylerSchrag-NOAA TylerSchrag-NOAA added the enhancement New feature or request label Jan 16, 2024
@TylerSchrag-NOAA TylerSchrag-NOAA added this to the V2.1.5 milestone Jan 16, 2024
@TylerSchrag-NOAA TylerSchrag-NOAA self-assigned this Jan 16, 2024
@nickchadwick-noaa nickchadwick-noaa merged commit 1350ac7 into ti Jan 25, 2024
1 check passed
@nickchadwick-noaa nickchadwick-noaa deleted the cached_fim_part1b branch January 25, 2024 15:33
@nickchadwick-noaa nickchadwick-noaa modified the milestones: V2.1.5, V2.1.6 Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants