-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow e3sm_to_cmip to use ilamb parameter names #672
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang This is ready for review. Unit tests pass. The primary file to review is zppy/e3sm_to_cmip.py
.
@xylar @tomvothecoder I think it would be good to get a second ok from at least one of you. This change allows users to set ts_atm_grid
and ts_land_grid
rather than just setting ts_grid
differently in each e3sm_to_cmip
subtask. It is a similar change for ts_subsection
.
Benefits of "guessing" parameters:
- Allows users much more flexibility. They have multiple options to get to their desired result.
- Can be more intuitive. For example, the
ilamb
task requires bothts_atm_grid
andts_land_grid
whereas thee3sm_to_cmip
task handled this by setting a single parameterts_grid
differently in itsatm
subtask and itsland
subtask. @chengzhuzhang pointed out this is not very intuitive to users.
Drawbacks:
- This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.
- Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.
raise ParameterNotProvidedError(parameter) | ||
|
||
|
||
def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasons for the different function are:
ts_grid
is used in the bash file whereasts_subsection
is used in the python file.- We will only try to guess
ts_subsection
if the guess parameter is on. The reason for this is somewhat historical -- when I did the parameter refactoring, I created parameters to turn on/off guessing for file paths or zppy cfg sections.
class ParameterGuessType(Enum):
PATH_GUESS = 1
SECTION_GUESS = 2
grid
was always used in the bash files using exactly what the user set it as, or else it was computed from the mapping file. This PR appears to be the first time we're allowing a grid
parameter to be set by an alternative set of parameters.
def set_grid(c: Dict[str, Any]) -> None:
# Grid name (if not explicitly defined)
# 'native' if no remapping
# or extracted from mapping filename
if c["grid"] == "":
if c["mapping_file"] == "":
c["grid"] = "native"
elif c["mapping_file"] == "glb":
c["grid"] = "glb"
else:
tmp = os.path.basename(c["mapping_file"])
# FIXME: W605 invalid escape sequence '\.'
tmp = re.sub("\.[^.]*\.nc$", "", tmp) # noqa: W605
tmp = tmp.split("_")
if tmp[0] == "map":
c["grid"] = f"{tmp[-2]}_{tmp[-1]}"
else:
raise ValueError(
f"Cannot extract target grid name from mapping file {c['mapping_file']}"
)
# If grid is defined, just use that
@forsyth2, I don't feel well enough versed on the analysis affected by these changes to have an opinion. I'd rather let @tomvothecoder and @chengzhuzhang provide feedback. |
I was thinking it only take a few lines of code change to also support |
I don't have a strong opinion for this specific case in I usually lean towards being explicit as possible with API design and configurations because too much flexibility can introduce significant overhead for both the developer and the user (even if they think it is helpful to be flexible). However, if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My quick code review.
# Use for e3sm_to_cmip and/or ilamb tasks. | ||
# Name of the grid used by the relevant `[ts]` atm subtask | ||
ts_atm_grid = string(default="180x360_aave") | ||
# Use for e3sm_to_cmip and/or ilamb tasks. | ||
# Name of the `[ts]` atm subtask to depend on | ||
ts_atm_subsection = string(default="") | ||
# Use for e3sm_to_cmip task (but NOT the ilamb task) -- you can either set this, or | ||
# both ts_atm_grid and ts_land_grid | ||
# Name of the grid used by the relevant `[ts]` task | ||
ts_grid = string(default="180x360_aave") | ||
# Use for e3sm_to_cmip and/or ilamb tasks. | ||
# Name of the grid used by the relevant `[ts]` land subtask | ||
ts_land_grid = string(default="180x360_aave") | ||
# Use for e3sm_to_cmip and/or ilamb tasks. | ||
# Name of the `[ts]` land subtask to depend on | ||
ts_land_subsection = string(default="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, you're moving these parameters up a level in the configuration hierarchy so that they can be reused across each e3sm_to_cmip
subtask, rather than configuring ts_grid
differently for each subtask. The intention is to eliminate redundant code/configuration.
Is this correct? If so, it sounds reasonable to me.
Also I would call it inferring instead of guessing if this is actually implemented. I was curious and asked ChatGPT:
|
@chengzhuzhang This feature has been part of As you noted, users prefer flexibility, so these guessing parameters are defaulted to True:
There are two "types" of guesses I defined -- file path guesses (e.g., we don't have a file path, but it's probably What this PR does is expand guessing functionality to
@chengzhuzhang believes users generally prefer more flexibility. I should also note, as mentioned earlier in this comment, we already do a very large amount of guesswork. This PR is just adding more. It's a little too late considering we're in the RC testing period, but perhaps sending out a survey to
This is my personal feeling. It's easier for users to run into issues from a missing parameter if they're setting different parameters to achieve the same result in different That said, I do think flexibility can reduce the learning curve -- e.g., users can get going faster with whatever parameters they have or what's intuitive to them. (What's intuitive to one person might not be to another!)
@tomvothecoder As mentioned above, the feature has existed since #628 -- this PR is just extending it even more. I guess the design decision at this point, is do we want to be extending it more?
This is close to correct. The primary purpose of this shift is 4 of the variables can now be set at the top level and apply to both
#628 already did the initial implementation using "guess". There's still time to change the parameter names to use "infer" instead of "guess" before the v3.0.0 release though. I can update the parameter names. |
Summary
Objectives:
e3sm_to_cmip
to use parameter names consistent with ILAMB. I.e., in addition tots_grid
andts_subsection
, allow users to setts_{component}_grid
andts_{component}_subsection
as well.Issue resolution:
Select one: This pull request is...
Small Change