Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API extending xgcm vertical regridding #388

Merged
merged 75 commits into from
Jun 15, 2023

Conversation

jasonb5
Copy link
Collaborator

@jasonb5 jasonb5 commented Nov 11, 2022

Description

Adds vertical regridding via xgcm.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@jasonb5 jasonb5 changed the title Adds initial vertical regridding via xgcm Adds vertical regridding Nov 11, 2022
@jasonb5
Copy link
Collaborator Author

jasonb5 commented Nov 11, 2022

Tasks left:

  • Add documentation
  • Add tests
  • Add example notebook
  • Determine method to derive grid coord, do we have a reliable way to tell position of data point e.g. left, center, right, inner, outer, maybe via cf_xarray. If not we'll need to have the user pass this information.
  • Add method to derive target_data if formula_terms is present in vertical coordinate and variables are present in dataset otherwise rely on user to pass variable.

The current plan for handling target_data is to provide an argument that can be one of the following.

  • xr.DataArray is used directly
  • str is used to reference variable in dataset
  • None will trigger auto derive

If auto derive is selected the vertical coordinate attributes will be checked for formula_terms. This will be used to collect all variables required. These variables will be processed using these formulas to derive the target_data. If auto derive fails the user will be presented with an error alerting them that they'll need to manually provide the target_data.

@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2022

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c70827a) 100.00% compared to head (50f01d9) 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #388   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           14        15    +1     
  Lines         1428      1517   +89     
=========================================
+ Hits          1428      1517   +89     
Impacted Files Coverage Δ
xcdat/axis.py 100.00% <100.00%> (ø)
xcdat/regridder/__init__.py 100.00% <100.00%> (ø)
xcdat/regridder/accessor.py 100.00% <100.00%> (ø)
xcdat/regridder/base.py 100.00% <100.00%> (ø)
xcdat/regridder/grid.py 100.00% <100.00%> (ø)
xcdat/regridder/regrid2.py 100.00% <100.00%> (ø)
xcdat/regridder/xesmf.py 100.00% <100.00%> (ø)
xcdat/regridder/xgcm.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Feb 1, 2023

Hey @jasonb5, I threw the notes from the past few meetings here since they are action items. Feel free to check-box or update as needed.

Action Items from Meetings (11/2022 - 1/2023)

  • Detecting target variable in the metadata
    • Example regrid depth variable -> pass target levels to transform
    • Pass name (derived and added to dataset), or pass DataArray itself of variable
      • Same process of deriving variables as our other methods
      • Seems like a common workflow, based on
      • Variable like temp on grid you don’t like, then regrid
  • Parametric - detect CF formula attribute to generate target output (add as a separate PR if we want this feature)
    • CF metadata should be present too
    • If no formulas are provided, parse internal formula via CF convention - formula_terms, maps to variables in the dataset
    • If we can’t parse the user supplied formula or find the formula_terms metadata, throw error
  • Parse info from pressure var metadata (cf_array to interpret metadata, pull out formulas) – cdat requires explicit passing of arrays (not intuitive)
  • Look into detecting coordinate positions internally
    • e.g,. outer, left, right
    • If can't be determined, error is raised and user has to pass info depending on vertical regridding method (conservative requires multiple points)
  • deal with spatially varying vertical elements
    • Look how xgcm handles missing values (use nans for outputs, not sure how they affect calculations right now)
    • depending on how missing values are handled, might need to manipulate data before passing into library
  • Find test datasets for ocean (try ​​https://github.com/pochedls/xsearch/)
    • ta/cfmon
    • dpaths = xs.findPaths('historical', 'ta', 'mon', cmipTable='cfMon')
    • thetao = dpaths = xs.findPaths('historical', 'thetao', 'mon')
    • Amon: cl, cli, clw on model levels
    • Amon: ta, hur, hurs… on pressure levels

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Jun 2, 2023

@jasonb5 – I was working on opening a ticket about this issue and I realize that the bounds are not dropped on the v0.5.0 release – it is happening on this branch. Here's a MCVE (fixed from above):

import xcdat as xc
import numpy as np
# define target grid
nlat = np.arange(-88.75, 90, 2.5)
nlon = np.arange(1.25, 360, 2.5)
ngrid = xc.create_grid(lat=nlat, lon=nlon)
# define / open dataset
dspath = '/p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/historical/r3i1p1f2/Omon/thetao/gn/v20190125/thetao_Omon_CNRM-CM6-1_historical_r3i1p1f2_gn_200001-201412.nc'
# another example path (for the atmospheric grid): 
# /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/amip/r1i1p1f1/Amon/ta/gn/v20190218/ta_Amon_CESM2_amip_r1i1p1f1_gn_195001-201412.nc
ds = xc.open_mfdataset(dspath)
# subset for minimal example
ds = ds.isel(time=[0])
# horizontal regridding
dsg = ds.regridder.horizontal('thetao', ngrid, method='conservative_normed')
dsg.lev_bnds

I see what's going on here, when horizontal regridding was first implemented the method would call ds.bounds.add_missing_bounds(axes=["X", "Y"]), with vertical regridding it's calling ds.bounds.add_missing_bounds(axes=["X", "Y", "Z", "T"]).

Should we even be adding bounds outside of what was generated during the regridding process?

If we do want the methods to fill out the bounds should it be for all dimensions or only the dimensions involved in the process e.g. horizontal generates only "X", "Y"?

One other potential issue only associated with this branch. When I exit iPython I get this warning:

[WARNING] yaksa: 10 leaked handle pool objects

I haven't been able to reproduce this yet, I'm not sure what is using the yaksa package.

@pochedls
Copy link
Collaborator

pochedls commented Jun 2, 2023

I see what's going on here, when horizontal regridding was first implemented the method would call ds.bounds.add_missing_bounds(axes=["X", "Y"]), with vertical regridding it's calling ds.bounds.add_missing_bounds(axes=["X", "Y", "Z", "T"]).

Should we even be adding bounds outside of what was generated during the regridding process?

If we do want the methods to fill out the bounds should it be for all dimensions or only the dimensions involved in the process e.g. horizontal generates only "X", "Y"?

One other potential issue only associated with this branch. When I exit iPython I get this warning:

[WARNING] yaksa: 10 leaked handle pool objects

I haven't been able to reproduce this yet, I'm not sure what is using the yaksa package.

The original dataset includes lev_bounds. These should be copied over after horizontal regridding (since the vertical bounds don't change during horizontal regridding). These bounds are still included as an attribute after horizontal regridding (i.e., dsg.lev.attrs includes "bounds": "lev_bounds"), even though bounds themselves are not in dsg.

So – we shouldn't generate the bounds, but I do think we should carry over the original bounds (that are unaffected by horizontal regridding).

I'm not sure how much we need to worry about yaksa, I get the warning on main branch, too (even after re-creating the dev environment).

@jasonb5
Copy link
Collaborator Author

jasonb5 commented Jun 6, 2023

@pochedls This logic sounds good, I'll update accordingly.

@tomvothecoder tomvothecoder changed the title Adds vertical regridding Add extension of xgcm vertical regridding Jun 7, 2023
@tomvothecoder tomvothecoder changed the title Add extension of xgcm vertical regridding Add API extending xgcm vertical regridding Jun 8, 2023
Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jason, here is my final review. The suggestions are mostly renaming things.

docs/examples/regridding-vertical.ipynb Outdated Show resolved Hide resolved
Comment on lines +1080 to +1088
mock_regridder = mock.MagicMock()
mock_regridder.return_value.vertical.return_value = "output data"

mock_data = mock.MagicMock()

with mock.patch.dict(accessor.VERTICAL_REGRID_TOOLS, {"xgcm": mock_regridder}):
output = self.ac.vertical("ts", mock_data, tool="xgcm", target_data=None)

assert output == "output data"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of MagicMock is neat here. For those interested (probably just me), the xgcm vertical regridder object is being mocked (mock_regridder) and set as the xgcm dictionary value in accessor.VERTICAL_REGRID_TOOLS. The return value of mock_regridder is set to "output_data", which is being tested in the assertion on line 1088. This test is to check if the correct vertical regridding tool is being referenced.

The actual result of the regridder object is not being tested (hence "output_data") because it is already tested through class TestXGCMRegridder.

)

regridder = regrid_tool(self._ds, output_grid, **options)
output_ds = regridder.horizontal(data_var, self._ds)

return output_ds

def vertical(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #501, are we also considering calling this vertical_xgcm and not having a general wrapper with tool="xgcm"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, I'll address it in the next PR.

Comment on lines +432 to +439
def create_grid(**kwargs: CoordOptionalBnds) -> xr.Dataset:
"""Creates a grid from coordinate mapping.

Parameters
----------
lat : Union[np.ndarray, xr.DataArray]
Array of latitude values.
lon : Union[np.ndarray, xr.DataArray]
Array of longitude values.
lat_bnds : Optional[Union[np.ndarray, xr.DataArray]]
Array of bounds for latitude values.
lon_bnds : Optional[Union[np.ndarray, xr.DataArray]]
Array of bounds for longitude values.
**kwargs : CoordOptionalBnds
Mapping of coordinate name and data with optional bounds. See
:py:data:`xcdat.axis.VAR_NAME_MAP` for valid coordinate names.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update (#496) will go in a new PR after this PR is merged right?

Coord = Union[np.ndarray, xr.DataArray]

CoordOptionalBnds = Union[Coord, Tuple[Coord, Coord]]


def preserve_bounds(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this function will be used publicly, or should it be private (_preserve_bounds())?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it won't be used publicly, I'll rename it .

Comment on lines 19 to 22
output_ds: xr.Dataset,
output_grid: xr.Dataset,
input_ds: xr.Dataset,
ignore_dims: List[CFAxisKey],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
output_ds: xr.Dataset,
output_grid: xr.Dataset,
input_ds: xr.Dataset,
ignore_dims: List[CFAxisKey],
input_ds: xr.Dataset,
output_grid: xr.Dataset,
output_ds: xr.Dataset,
ignore_dims: List[CFAxisKey],

It's more intuitive to me to have the input params input_ds and output_grid first since they contain the bounds info, which are copied/preserved to the output_ds.

I'm thinking input -> output basically. Not a big deal though, especially if this should be a private function (my previous comment).

xcdat/regridder/xgcm.py Outdated Show resolved Hide resolved
xcdat/regridder/xgcm.py Outdated Show resolved Hide resolved
xcdat/regridder/xgcm.py Outdated Show resolved Hide resolved
@jasonb5 jasonb5 merged commit be3244e into xCDAT:main Jun 15, 2023
@jasonb5 jasonb5 deleted the adds_vertical_regrid branch June 15, 2023 20:58
@tomvothecoder
Copy link
Collaborator

🎉🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Add vertical regridding
5 participants