How to present static files in the catalogs? #686
Replies: 4 comments
-
@aradhakrishnanGFDL The catalog should have |
Beta Was this translation helpful? Give feedback.
-
@wrongkindofdoctor Using fx sounds great! The GFDL PP datasets are organized in the PP components directory, which sadly indicates frequency also in the names, which I will ignore for this context. Let's see if this is sufficient to describe both CMIP and GFDL statics: CSV headers: activity_id,institution_id,source_id,experiment_id,frequency,realm,table_id,member_id,grid_label,variable_id,time_range,chunk_freq,platform,dimensions,cell_methods,standard_name,path
BUT: Multiple variables in one static dataset: Option 1: intake-esm multiple-variables support: Unlike CMIP, there are MULTIPLE static variables in a single static dataset (below). Pros: in-built feature in intake-esm. Even if there 10 static variables in a GFDL static dataset, the returned xarray object only has what we searched for. Efficient. Cons: Though there is an intake-esm convention-ish to pass a list of variable_id(s) to indicate for a single asset, we are unable to leverage that because then we cannot provide the matching standard names for each of the static variables, that MDTF needs. Catalog builder needs to "open" datasets to query for all variable names, slower runtime. oo..does not look it's interoperable with other single-variable datasets in the same catalog. Q: Is this interoperable with the json aggregation we currently have? To be explored. Notebook example in the works. Option 2: Catalog each variable in the static dataset with the same path like any other variable.
Option 3: For the GFDL runs: what-if variable_id is empty or descriptive to denote it's a static dataset. Pros: catalog builder remains light-weight, but there are helper utilities to explore multiple variables in a single dataset. No redundant data arrays in the xarray obj. dev,,fre,CM4.5v01_om5b06_piC_noBLING,mon,ocean_monthly,fx,,,static,,,gfdl.ncrc5-intel23-prod-openmp,,ts,'',/archive/John.Krasting/fre/FMS2024.02_OM5_20240724/CM4.5v01_om5b06_piC_noBLING/gfdl.ncrc5-intel23-prod-openmp/pp/ocean_monthly/ocean_monthly.static.nc |
Beta Was this translation helpful? Give feedback.
-
Option 3 was agreed upon based on input received at the GFDL analysis subgroup meeting today, in terms of how the catalog looks and how it can be used. variable_id = "fixed" So, the static dataset for different realms will be presented as this key in the returned dictionary from intake-esm, following option 3. fre.CM4.5v01_om5b06_piC_noBLING.fx.fx.ocean_monthly and so on. If we still want variable_id = "ocean_monthly.fixed" to be more descriptive, we can do that. This was an idea from @apreucil. |
Beta Was this translation helpful? Give feedback.
-
closing discussing, option 3 will be implemented and hope this is compatible with the MDTF. If not..please open an issue @wrongkindofdoctor |
Beta Was this translation helpful? Give feedback.
-
The catalog builder developed by GFDL is looking into cataloging the static files (e.g. ocean.static.nc). I imagine one of the following ways to do this
1)For every static file, we could get all the variable names in the dataset.
Then the csv catalog will have as many entries based on the variable_id , pointing to the same ocean.static.nc path.
e.g.
exp, model, ..., xt_ocean,..., ocean.static.nc
exp, model, ..., yt_ocean,..., ocean.static.nc
..
[1] requires opening the netcdf file (one per dataset) (this will nicely fit into the CMIP vocabulary so we can "fx" as the miptable and provide distinct variable names. Noting that the clubbing of all the static variables in one netcdf is not the cmip way, yet there needs to be some support to handle it in the catalog builder without compensating the performance )
[2] can handle a lightweight catalog builder.
[3] could be something else
What is MDTF's requirement for static files presented in the catalog ?
thoughts? ideas?
Beta Was this translation helpful? Give feedback.
All reactions