Allow e3sm_to_cmip to use ilamb parameter names #672

forsyth2 · 2025-02-04T02:39:32Z

Summary

Objectives:

Allow e3sm_to_cmip to use parameter names consistent with ILAMB. I.e., in addition to ts_grid and ts_subsection, allow users to set ts_{component}_grid and ts_{component}_subsection as well.

Issue resolution:

Addresses Parameter setting for new `e3sm_to_cmip` task #667 (reply in thread)

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Small Change

To merge, I will use "Squash and merge". That is, this change should be a single commit.
Logic: I have visually inspected the entire pull request myself.
Pre-commit checks: All the pre-commits checks have passed.

forsyth2

@chengzhuzhang This is ready for review. Unit tests pass. The primary file to review is zppy/e3sm_to_cmip.py.

@xylar @tomvothecoder I think it would be good to get a second ok from at least one of you. This change allows users to set ts_atm_grid and ts_land_grid rather than just setting ts_grid differently in each e3sm_to_cmip subtask. It is a similar change for ts_subsection.

Benefits of "guessing" parameters:

Allows users much more flexibility. They have multiple options to get to their desired result.
Can be more intuitive. For example, the ilamb task requires both ts_atm_grid and ts_land_grid whereas the e3sm_to_cmip task handled this by setting a single parameter ts_grid differently in its atm subtask and its land subtask. @chengzhuzhang pointed out this is not very intuitive to users.

Drawbacks:

This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.
Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.

forsyth2 · 2025-02-04T03:06:57Z

zppy/e3sm_to_cmip.py

+            raise ParameterNotProvidedError(parameter)
+
+
+def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None:


The reasons for the different function are:

ts_grid is used in the bash file whereas ts_subsection is used in the python file.

We will only try to guess ts_subsection if the guess parameter is on. The reason for this is somewhat historical -- when I did the parameter refactoring, I created parameters to turn on/off guessing for file paths or zppy cfg sections.

class ParameterGuessType(Enum): PATH_GUESS = 1 SECTION_GUESS = 2

grid was always used in the bash files using exactly what the user set it as, or else it was computed from the mapping file. This PR appears to be the first time we're allowing a grid parameter to be set by an alternative set of parameters.

def set_grid(c: Dict[str, Any]) -> None: # Grid name (if not explicitly defined) # 'native' if no remapping # or extracted from mapping filename if c["grid"] == "": if c["mapping_file"] == "": c["grid"] = "native" elif c["mapping_file"] == "glb": c["grid"] = "glb" else: tmp = os.path.basename(c["mapping_file"]) # FIXME: W605 invalid escape sequence '\.' tmp = re.sub("\.[^.]*\.nc$", "", tmp) # noqa: W605 tmp = tmp.split("_") if tmp[0] == "map": c["grid"] = f"{tmp[-2]}_{tmp[-1]}" else: raise ValueError( f"Cannot extract target grid name from mapping file {c['mapping_file']}" ) # If grid is defined, just use that

xylar · 2025-02-04T07:36:25Z

@forsyth2, I don't feel well enough versed on the analysis affected by these changes to have an opinion. I'd rather let @tomvothecoder and @chengzhuzhang provide feedback.

chengzhuzhang · 2025-02-04T18:20:32Z

I was thinking it only take a few lines of code change to also support ts_atm/lnd_grid. I don't fully understand of the guessing parameter part, and I feel the complication perhaps is not necessary, and is very hard to test.

tomvothecoder · 2025-02-04T18:35:18Z

Benefits of "guessing" parameters:

* Allows users much more flexibility. They have multiple options to get to their desired result.

* Can be more intuitive. For example, the `ilamb` task requires both `ts_atm_grid` and `ts_land_grid` whereas the `e3sm_to_cmip` task handled this by setting a single parameter `ts_grid` differently in its `atm` subtask and its `land` subtask. @chengzhuzhang pointed out this is not very intuitive to users.

Drawbacks:

* This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.

* Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.

I don't have a strong opinion for this specific case in zppy. I would probably ask users what their opinion is on the current design and whether adding a guessing/inference feature is helpful/confusing.

I usually lean towards being explicit as possible with API design and configurations because too much flexibility can introduce significant overhead for both the developer and the user (even if they think it is helpful to be flexible). However, if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable.

tomvothecoder

My quick code review.

tomvothecoder · 2025-02-04T18:25:49Z

zppy/defaults/default.ini

+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the grid used by the relevant `[ts]` atm subtask
+ts_atm_grid = string(default="180x360_aave")
+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the `[ts]` atm subtask to depend on
+ts_atm_subsection = string(default="")
+# Use for e3sm_to_cmip task (but NOT the ilamb task) -- you can either set this, or
+# both ts_atm_grid and ts_land_grid
+# Name of the grid used by the relevant `[ts]` task
+ts_grid = string(default="180x360_aave")
+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the grid used by the relevant `[ts]` land subtask
+ts_land_grid = string(default="180x360_aave")
+# Use for e3sm_to_cmip and/or ilamb tasks.
+# Name of the `[ts]` land subtask to depend on
+ts_land_subsection = string(default="")


From what I understand, you're moving these parameters up a level in the configuration hierarchy so that they can be reused across each e3sm_to_cmip subtask, rather than configuring ts_grid differently for each subtask. The intention is to eliminate redundant code/configuration.

Is this correct? If so, it sounds reasonable to me.

tomvothecoder · 2025-02-04T18:38:46Z

Also I would call it inferring instead of guessing if this is actually implemented.

I was curious and asked ChatGPT:

The difference between guessing and inferring lies in the use of evidence and reasoning:

Guessing is making a statement or prediction without sufficient evidence or logical reasoning. It is often based on chance, intuition, or incomplete knowledge. For example, if someone asks you to guess a number between 1 and 100, you might pick 42 without any clues.

Inferring is drawing a conclusion based on available evidence and logical reasoning. It involves analyzing facts, patterns, or context to arrive at a probable answer. For example, if you see wet ground and dark clouds, you might infer that it recently rained.

In short, guessing is random or uncertain, while inferring is reasoned and based on observable clues.

forsyth2 · 2025-02-04T19:09:58Z

I don't fully understand of the guessing parameter part

@chengzhuzhang This feature has been part of main since #628. I found it useful to include as the developer because I needed to know what the minimal set of parameters was -- i.e., what parameters does a task actually need to run? And are these passed in directly or "guessed" (or rather "inferred" as @tomvothecoder points out)? If they're guessed, do we have the parameters needed to guess? This is very similar to the variable derivation in e3sm_diags or global_time_series: ok we really need parameter x, but can we compute it or assume it based on the parameters we do have?

As you noted, users prefer flexibility, so these guessing parameters are defaulted to True:

# These two parameters enable zppy to guess path or section parameters.
# This allows users to set fewer parameters, but with the risk of zppy choosing incorrect values for them.
# Set to False for more transparency in path or section defintions.
guess_path_parameters = boolean(default=True)
guess_section_parameters = boolean(default=True)

There are two "types" of guesses I defined -- file path guesses (e.g., we don't have a file path, but it's probably xyz) and section name guesses (e.g., we don't know what subtask is a dependency for this task, but it's probably the ts subtask matching the name of this subtask)

What this PR does is expand guessing functionality to ts_subsection and ts_grid (which as I note in the PR comments, doesn't fall into either of those two "types" of guesses.

I would probably ask users what their opinion is on the current design and whether adding a guessing/inference feature is helpful/confusing.

@chengzhuzhang believes users generally prefer more flexibility. I should also note, as mentioned earlier in this comment, we already do a very large amount of guesswork. This PR is just adding more.

It's a little too late considering we're in the RC testing period, but perhaps sending out a survey to zppy users asking their opinions on features might be a good idea to consider in the future.

I usually lean towards being explicit as possible with API design [...] (even if they think it is helpful to be flexible).

This is my personal feeling. It's easier for users to run into issues from a missing parameter if they're setting different parameters to achieve the same result in different cfg's (e.g., I'm imagining a user being confused because one cfg sets ts_grid but another sets ts_land_grid and ts_atm_grid).

That said, I do think flexibility can reduce the learning curve -- e.g., users can get going faster with whatever parameters they have or what's intuitive to them. (What's intuitive to one person might not be to another!)

if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable.

@tomvothecoder As mentioned above, the feature has existed since #628 -- this PR is just extending it even more. I guess the design decision at this point, is do we want to be extending it more?

The intention is to eliminate redundant code/configuration. Is this correct? If so, it sounds reasonable to me.

This is close to correct. The primary purpose of this shift is 4 of the variables can now be set at the top level and apply to both e3sm_to_cmip and ilamb (rather than simply the subtasks of e3sm_to_cmip).

I would call it inferring instead of guessing if this is actually implemented.

#628 already did the initial implementation using "guess". There's still time to change the parameter names to use "infer" instead of "guess" before the v3.0.0 release though. I can update the parameter names.

Allow e3sm_to_cmip to use ilamb parameter names

1cf4fa2

forsyth2 commented Feb 4, 2025

View reviewed changes

tomvothecoder reviewed Feb 4, 2025

View reviewed changes

This was referenced Feb 4, 2025

[Bug]: Another parameter checking error: zppy.utils.ParameterNotProvidedError: climo_land_subsection #669

Closed

[Doc]: Explain parameter inference #680

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow e3sm_to_cmip to use ilamb parameter names #672

Allow e3sm_to_cmip to use ilamb parameter names #672

forsyth2 commented Feb 4, 2025

forsyth2 left a comment

forsyth2 Feb 4, 2025

xylar commented Feb 4, 2025

chengzhuzhang commented Feb 4, 2025

tomvothecoder commented Feb 4, 2025

tomvothecoder left a comment

tomvothecoder Feb 4, 2025

tomvothecoder commented Feb 4, 2025

forsyth2 commented Feb 4, 2025

		raise ParameterNotProvidedError(parameter)


		def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None:

Allow e3sm_to_cmip to use ilamb parameter names #672

Are you sure you want to change the base?

Allow e3sm_to_cmip to use ilamb parameter names #672

Conversation

forsyth2 commented Feb 4, 2025

Summary

Small Change

forsyth2 left a comment

Choose a reason for hiding this comment

forsyth2 Feb 4, 2025

Choose a reason for hiding this comment

xylar commented Feb 4, 2025

chengzhuzhang commented Feb 4, 2025

tomvothecoder commented Feb 4, 2025

tomvothecoder left a comment

Choose a reason for hiding this comment

tomvothecoder Feb 4, 2025

Choose a reason for hiding this comment

tomvothecoder commented Feb 4, 2025

forsyth2 commented Feb 4, 2025