generated from NOAA-OWP/owp-open-source-project-template
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached FIM - Part 1b - Removal of Redshift & Other Optimizations / Fixes #620
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…uring coastal runs
… huc8, branch combinations (added as serial type to derived.fim4_crosswalk table)
…normal ingest schema with all the new fim tables.
…cess. - will add addtional dummy row logic for hi
…need a custom exception as well.
nickchadwick-noaa
approved these changes
Jan 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR marks another iterative step towards stabilizing the new cached FIM implementation. It includes several minor optimizations and fixes, with a couple major pivots to the infrastructure / workflow:
Major Changes:
Switch from HydroID, HUC8 & Branch indexing of HAND data to a new unique 'HAND_ID' integer. - We may need to update this moving forward to match a coordinated effort on the FIM Dev team, but switching to a single unique integer ID speeds up the database operations significantly, as well as simplifying the scripting and join logic.
Removal of Redshift Data Warehouse, Implementing in RDS now instead - I initially chose Redshift for this feature after testing a prototype in early 2023 that was too large for RDS to run. That prototype was based on the initial [bad] plan that we would write a process that would preprocess all ~440 million HAND hydrotable geometries in advance (all steps of the synthetic rating curves), and would need the infrastructure to query that entire dataset efficiently enough for VPP pipelines. After testing our new lazy loading approach for several weeks (a much better idea, suggested by Corey), it has become apparent that most pipelines utilize a very small portion of the full hydrotables, and with the HAND_ID optimization, RDS is likely up to the task of handling these cache workflows just fine.
While I wish I would have thought of these considerations earlier on and saved the work of trying out Redshift fully to begin with, I think this is ultimately a great pivot that dramatically simplifies and stabilizes this Cached FIM enhancement. The previous PRs of this series can serve as a reference for the team should they decide to utilize Redshift in the future... but it is worth noting that I still hadn't completely sorted out some issues that Redshift was having with some of the more complex FIM geometries that weren't included in my initial testing last year - that may end up being a full on deal breaker with Redshift (Aurora may be worth a try first if/when scaling the RDS instance isn't a good option any longer).
Other Noteworthy Edits:
Deployment / DB Dump Considerations:
Update:
I've added a bunch of misc. fixes and clean-up to this branch while it's been waiting deployment, and think I have resolved all issues with the regular operational pipelines (at least that I know of). I still need to fully implement AEP and CatFIM pipelines, but these should not hold up deployment to UAT, since they are only run as one-off products anyways (will need to be updated next for NWM 3.0 Recurrence Flow Updates and/or next FIM Version update). I'm planning to wrap that up in early February.