Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow e3sm_to_cmip to use ilamb parameter names #672

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Feb 4, 2025

Summary

Objectives:

  • Allow e3sm_to_cmip to use parameter names consistent with ILAMB. I.e., in addition to ts_grid and ts_subsection, allow users to set ts_{component}_grid and ts_{component}_subsection as well.

Issue resolution:

Select one: This pull request is...

  • a bug fix: increment the patch version
  • a small improvement: increment the minor version
  • a new feature: increment the minor version
  • an incompatible (non-backwards compatible) API change: increment the major version

Small Change

  • To merge, I will use "Squash and merge". That is, this change should be a single commit.
  • Logic: I have visually inspected the entire pull request myself.
  • Pre-commit checks: All the pre-commits checks have passed.

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang This is ready for review. Unit tests pass. The primary file to review is zppy/e3sm_to_cmip.py.

@xylar @tomvothecoder I think it would be good to get a second ok from at least one of you. This change allows users to set ts_atm_grid and ts_land_grid rather than just setting ts_grid differently in each e3sm_to_cmip subtask. It is a similar change for ts_subsection.

Benefits of "guessing" parameters:

  • Allows users much more flexibility. They have multiple options to get to their desired result.
  • Can be more intuitive. For example, the ilamb task requires both ts_atm_grid and ts_land_grid whereas the e3sm_to_cmip task handled this by setting a single parameter ts_grid differently in its atm subtask and its land subtask. @chengzhuzhang pointed out this is not very intuitive to users.

Drawbacks:

  • This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.
  • Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.

raise ParameterNotProvidedError(parameter)


def check_and_define_parameters(c: Dict[str, Any], sub: str) -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasons for the different function are:

  1. ts_grid is used in the bash file whereas ts_subsection is used in the python file.
  2. We will only try to guess ts_subsection if the guess parameter is on. The reason for this is somewhat historical -- when I did the parameter refactoring, I created parameters to turn on/off guessing for file paths or zppy cfg sections.
class ParameterGuessType(Enum):
    PATH_GUESS = 1
    SECTION_GUESS = 2

grid was always used in the bash files using exactly what the user set it as, or else it was computed from the mapping file. This PR appears to be the first time we're allowing a grid parameter to be set by an alternative set of parameters.

def set_grid(c: Dict[str, Any]) -> None:
    # Grid name (if not explicitly defined)
    #   'native' if no remapping
    #   or extracted from mapping filename
    if c["grid"] == "":
        if c["mapping_file"] == "":
            c["grid"] = "native"
        elif c["mapping_file"] == "glb":
            c["grid"] = "glb"
        else:
            tmp = os.path.basename(c["mapping_file"])
            # FIXME: W605 invalid escape sequence '\.'
            tmp = re.sub("\.[^.]*\.nc$", "", tmp)  # noqa: W605
            tmp = tmp.split("_")
            if tmp[0] == "map":
                c["grid"] = f"{tmp[-2]}_{tmp[-1]}"
            else:
                raise ValueError(
                    f"Cannot extract target grid name from mapping file {c['mapping_file']}"
                )
    # If grid is defined, just use that

@xylar
Copy link
Contributor

xylar commented Feb 4, 2025

@forsyth2, I don't feel well enough versed on the analysis affected by these changes to have an opinion. I'd rather let @tomvothecoder and @chengzhuzhang provide feedback.

@chengzhuzhang
Copy link
Collaborator

I was thinking it only take a few lines of code change to also support ts_atm/lnd_grid. I don't fully understand of the guessing parameter part, and I feel the complication perhaps is not necessary, and is very hard to test.

@tomvothecoder
Copy link
Collaborator

Benefits of "guessing" parameters:

* Allows users much more flexibility. They have multiple options to get to their desired result.

* Can be more intuitive. For example, the `ilamb` task requires both `ts_atm_grid` and `ts_land_grid` whereas the `e3sm_to_cmip` task handled this by setting a single parameter `ts_grid` differently in its `atm` subtask and its `land` subtask. @chengzhuzhang pointed out this is not very intuitive to users.

Drawbacks:

* This sort of algorithmic guesswork adds a decent amount of code, in particular in tests. Unit tests expand trying to keep up with the number of parameter combinations. This can increase tech debt.

* Pollutes the parameter space. I.e., there are now more parameters for users to be aware of and for developers to keep track of.

I don't have a strong opinion for this specific case in zppy. I would probably ask users what their opinion is on the current design and whether adding a guessing/inference feature is helpful/confusing.

I usually lean towards being explicit as possible with API design and configurations because too much flexibility can introduce significant overhead for both the developer and the user (even if they think it is helpful to be flexible). However, if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable.

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My quick code review.

Comment on lines +60 to +75
# Use for e3sm_to_cmip and/or ilamb tasks.
# Name of the grid used by the relevant `[ts]` atm subtask
ts_atm_grid = string(default="180x360_aave")
# Use for e3sm_to_cmip and/or ilamb tasks.
# Name of the `[ts]` atm subtask to depend on
ts_atm_subsection = string(default="")
# Use for e3sm_to_cmip task (but NOT the ilamb task) -- you can either set this, or
# both ts_atm_grid and ts_land_grid
# Name of the grid used by the relevant `[ts]` task
ts_grid = string(default="180x360_aave")
# Use for e3sm_to_cmip and/or ilamb tasks.
# Name of the grid used by the relevant `[ts]` land subtask
ts_land_grid = string(default="180x360_aave")
# Use for e3sm_to_cmip and/or ilamb tasks.
# Name of the `[ts]` land subtask to depend on
ts_land_subsection = string(default="")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, you're moving these parameters up a level in the configuration hierarchy so that they can be reused across each e3sm_to_cmip subtask, rather than configuring ts_grid differently for each subtask. The intention is to eliminate redundant code/configuration.

Is this correct? If so, it sounds reasonable to me.

@tomvothecoder
Copy link
Collaborator

Also I would call it inferring instead of guessing if this is actually implemented.

I was curious and asked ChatGPT:

The difference between guessing and inferring lies in the use of evidence and reasoning:

  • Guessing is making a statement or prediction without sufficient evidence or logical reasoning. It is often based on chance, intuition, or incomplete knowledge. For example, if someone asks you to guess a number between 1 and 100, you might pick 42 without any clues.

  • Inferring is drawing a conclusion based on available evidence and logical reasoning. It involves analyzing facts, patterns, or context to arrive at a probable answer. For example, if you see wet ground and dark clouds, you might infer that it recently rained.

In short, guessing is random or uncertain, while inferring is reasoned and based on observable clues.

@forsyth2
Copy link
Collaborator Author

forsyth2 commented Feb 4, 2025

I don't fully understand of the guessing parameter part

@chengzhuzhang This feature has been part of main since #628. I found it useful to include as the developer because I needed to know what the minimal set of parameters was -- i.e., what parameters does a task actually need to run? And are these passed in directly or "guessed" (or rather "inferred" as @tomvothecoder points out)? If they're guessed, do we have the parameters needed to guess? This is very similar to the variable derivation in e3sm_diags or global_time_series: ok we really need parameter x, but can we compute it or assume it based on the parameters we do have?

As you noted, users prefer flexibility, so these guessing parameters are defaulted to True:

# These two parameters enable zppy to guess path or section parameters.
# This allows users to set fewer parameters, but with the risk of zppy choosing incorrect values for them.
# Set to False for more transparency in path or section defintions.
guess_path_parameters = boolean(default=True)
guess_section_parameters = boolean(default=True)

There are two "types" of guesses I defined -- file path guesses (e.g., we don't have a file path, but it's probably xyz) and section name guesses (e.g., we don't know what subtask is a dependency for this task, but it's probably the ts subtask matching the name of this subtask)

What this PR does is expand guessing functionality to ts_subsection and ts_grid (which as I note in the PR comments, doesn't fall into either of those two "types" of guesses.

I would probably ask users what their opinion is on the current design and whether adding a guessing/inference feature is helpful/confusing.

@chengzhuzhang believes users generally prefer more flexibility. I should also note, as mentioned earlier in this comment, we already do a very large amount of guesswork. This PR is just adding more.

It's a little too late considering we're in the RC testing period, but perhaps sending out a survey to zppy users asking their opinions on features might be a good idea to consider in the future.

I usually lean towards being explicit as possible with API design [...] (even if they think it is helpful to be flexible).

This is my personal feeling. It's easier for users to run into issues from a missing parameter if they're setting different parameters to achieve the same result in different cfg's (e.g., I'm imagining a user being confused because one cfg sets ts_grid but another sets ts_land_grid and ts_atm_grid).

That said, I do think flexibility can reduce the learning curve -- e.g., users can get going faster with whatever parameters they have or what's intuitive to them. (What's intuitive to one person might not be to another!)

if inferring/guessing can be implemented relatively simply with default fallback behavior(s) (if it fails) and it makes the lives of users easier, then this feature might be reasonable.

@tomvothecoder As mentioned above, the feature has existed since #628 -- this PR is just extending it even more. I guess the design decision at this point, is do we want to be extending it more?

The intention is to eliminate redundant code/configuration. Is this correct? If so, it sounds reasonable to me.

This is close to correct. The primary purpose of this shift is 4 of the variables can now be set at the top level and apply to both e3sm_to_cmip and ilamb (rather than simply the subtasks of e3sm_to_cmip).

I would call it inferring instead of guessing if this is actually implemented.

#628 already did the initial implementation using "guess". There's still time to change the parameter names to use "infer" instead of "guess" before the v3.0.0 release though. I can update the parameter names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants