Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a test notebook comparing alternate methods for querying the NWM data #7

Open
jameshalgren opened this issue Aug 29, 2022 · 8 comments

Comments

@jameshalgren
Copy link
Contributor

Pulling further discussion over from here:
CIROH-UA/awi-ciroh-image#16

@necrisci
Copy link
Contributor

Queries to test might include these time-space combinations. This would just be for querying individual forecasts or collections of individual forecast time series. A similar tree could be built for the analysis, the forcing, and lead time - sliced forecast subsets.

**** **** **** **** Space **** **** ****
**** S (1 point) M (10 - 150 pts) L (150 - 1000 pts) XL (1000 - 10K pts) XXL (10K - 100K pts) XXXL (100K - 2.7M pts)
**** S (1 hr - 24 hrs)
**** M (2 d - 10 d)
Time L (10 d - 2 mo)
**** XL (2 mo +)
**** XXL (40 yr +)

@jameshalgren
Copy link
Contributor Author

Additional variables:
Number of Threads
Chunking method
Forecast vs. Analysis
Gridded vs. Scalar (some of our methods only work for scalar or are at least presumptive somewhat.)

@arpita0911patel arpita0911patel moved this to Todo in Data Access Sep 30, 2022
@arpita0911patel arpita0911patel moved this from Todo to In Progress in Data Access Sep 30, 2022
@jameshalgren
Copy link
Contributor Author

Here we go... Please drop notes in here...
@ZacharyWills @mgdenno
@hellkite500 @karnesh

@jameshalgren
Copy link
Contributor Author

ping @RohanSunkarapalli

@jameshalgren
Copy link
Contributor Author

jameshalgren commented Dec 29, 2022

For AMS January 11th
Tests to perform:

For a given NWM stream segment

  • [Critical] build the dataframe for the plot from the interactive map with all traces for any arbitrary date in the record.
  • [Optional] assembling an arbitrary 14 day sequence from the retrospective simulation.
  • [Bonus] query the forecasts and assemble the 14-day series of same-lead-time forecasts from short range (f006) and medium range (f048).

Note: the 14 days is essentially an arbitrary length -- but should be a good starting point for measurement. Right now, we can do that with the Kerchunk methods, but it takes an awfully long time.

Dataset versions against which to perform tests:
(Bold datasets are currently publicly available)

  • NC Reformatted via Kerchunk
  • NC NWM Native via Kerchunk
  • NC Reformatted and subdivided by HUC via Kerchunk
  • NC NWM Native and subdivided by HUC via Kerchunk
  • Parquet 1:1 with NWM Native
  • Parquet 1:1 with Reformatted
  • Parquet 1:1 with NWM Native subdivided by HUC
  • Parquet 1:1 with Reformatted subdivided by HUC

Bonus

  • Big Query
  • [stretch goal] External database

Overall Todo:

  • Determine test content for forecast datasets
  • Determine test content for retrospective datasets
  • Script creation of missing datasets (much of this has been preliminarily templated by @karnesh and @arpita0911patel)
  • Develop simple test protocol
  • Run and record results

@jameshalgren
Copy link
Contributor Author

See here. E.g.,
image

@RohanSunkarapalli
Copy link
Contributor

RohanSunkarapalli commented Dec 31, 2022 via email

@jameshalgren
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants