Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for native ERA5 data in GRIB format #2178

Merged
merged 84 commits into from
Dec 6, 2024
Merged

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Aug 23, 2023

Description

This PR allows ESMValCore to process native ERA5 data in GRIB format, which is for example available on Levante in the /pool/data/ERA5 directory.

Reading the data

The following settings are necessary in the user configuration file:

rootpath:
  ...
  native6:
    /pool/data/ERA5: DKRZ-ERA5-GRIB
  ...

I added an extra facets file which includes reasonable default for all supported variables. You can check it out here.

Thus, reading this data is as easy as

datasets:
  - {project: native6, dataset: ERA5, timerange: '2000/2001', short_name: tas, mip: Amon}
  - {project: native6, dataset: ERA5, timerange: '2000/2001', short_name: cl, mip: Amon, tres: 1H, frequency: 1hr}
  - {project: native6, dataset: ERA5, timerange: '2000/2001', short_name: ta, mip: Amon, type: fc, typeid: '12'}

Regridding

Native ERA5 data in GRIB format is on a reduced Gaussian grid (i.e., an unstructured grid). Thus, in 99% of the use cases, it is necessary to regrid this data, especially since no cell areas are available for the data (thus, we cannot even calculate global/regional statistics over the native data). This is done automatically by the CMORizer (as recommended by the ECMWF), but can be disabled in the recipe:

datasets:
  - {project: native6, dataset: ERA5, timerange: '2000/2001', short_name: tas, mip: Amon, automatic_regrid: false}

This PR depends on the following other PRs:


Closes #1991
Closes ESMValGroup/ESMValTool#3238

Link to documentation: https://esmvaltool--2178.org.readthedocs.build/projects/ESMValCore/en/2178/quickstart/find_data.html#supported-native-reanalysis-observational-datasets


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added this to the v2.10.0 milestone Aug 23, 2023
@schlunma schlunma self-assigned this Aug 23, 2023
@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.85%. Comparing base (a328578) to head (9af867a).
Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2178      +/-   ##
==========================================
+ Coverage   94.66%   94.85%   +0.18%     
==========================================
  Files         251      251              
  Lines       14287    14371      +84     
==========================================
+ Hits        13525    13631     +106     
+ Misses        762      740      -22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@schlunma
Copy link
Contributor Author

schlunma commented Aug 25, 2023

This is ready from my side, but there's two issues that need to be resolved before I mark this ready for review:

I tested this thoroughly with the following recipe: recipe_000.yml.txt

An example run is available on Levante here: /home/b/b309141/scratch/esmvaltool_output/recipe_000_20230825_080240

Note that with the default dask scheduler, this recipe ran into a timeout after 8 hours with 67/76 tasks finished. With the following dask configuration, I could run the same recipe on the same node (regular Levante compute node with 256 GiB of memory) in 5:27 min (!!) 🚀.

cluster:
  type: distributed.LocalCluster
  n_workers: 32
  threads_per_worker: 4
  memory_limit: 8 GiB

@ESMValGroup/technical-lead-development-team

@schlunma schlunma modified the milestones: v2.10.0, v2.11.0 Sep 28, 2023
Copy link
Contributor

@bettina-gier bettina-gier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few typos and had a comment for futureproofing the automatic regridding in case we are adding more grib datasets (CAMS soon TM). Runs fine though and input looks reasonable!
Think we can merge this with 3 pull request reviews and all issues and dependencies solved?

doc/quickstart/find_data.rst Outdated Show resolved Hide resolved
doc/quickstart/find_data.rst Outdated Show resolved Hide resolved
doc/quickstart/find_data.rst Outdated Show resolved Hide resolved
doc/quickstart/find_data.rst Outdated Show resolved Hide resolved
esmvalcore/_recipe/recipe.py Show resolved Hide resolved
@schlunma
Copy link
Contributor Author

schlunma commented Dec 6, 2024

While testing this again, @bettina-gier and myself got the following error ZeroDivisionError: integer division or modulo by zero as already reported here.

Full traceback:
2024-12-06 09:38:18,767 UTC [4154330] ERROR   Program terminated abnormally, see stack trace below for more information:
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/_task.py", line 895, in _run_task
    output_files = task.run()
                   ^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/_task.py", line 290, in run
    self.output_files = self._run(input_files)
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/__init__.py", line 730, in _run
    product.apply(step, self.debug)
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/__init__.py", line 527, in apply
    self.cubes = preprocess(
                 ^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/__init__.py", line 430, in preprocess
    _run_preproc_function(
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/__init__.py", line 364, in _run_preproc_function
    return function(items, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/_shared.py", line 237, in wrapper
    result = func(data, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/_time.py", line 826, in climate_statistics
    agg_kwargs = update_weights_kwargs(
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/_shared.py", line 162, in update_weights_kwargs
    callback(cube, **callback_kwargs)
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/_time.py", line 870, in _add_time_weights_coord
    get_time_weights(cube),
    ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/b/b309141/tmp/ESMValCore/esmvalcore/preprocessor/_shared.py", line 377, in get_time_weights
    time_weights = time_weights.rechunk(time_chunks)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/core.py", line 2779, in rechunk
    return rechunk(self, chunks, threshold, block_size_limit, balance, method)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/rechunk.py", line 349, in rechunk
    chunks = normalize_chunks(
             ^^^^^^^^^^^^^^^^^
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/core.py", line 3151, in normalize_chunks
    chunks = _convert_int_chunk_to_tuple(shape, chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/core.py", line 3177, in _convert_int_chunk_to_tuple
    return sum(
           ^^^^
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/core.py", line 3180, in <genexpr>
    blockdims_from_blockshape((s,), (c,))
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/core.py", line 1279, in blockdims_from_blockshape
    return tuple(
           ^^^^^^
  File "/work/bd0854/b309141/micromamba/envs/test_np1/lib/python3.12/site-packages/dask/array/core.py", line 1280, in <genexpr>
    ((bd,) * (d // bd) + ((d % bd,) if d % bd else ()) if d else (0,))
              ~~^^~~~
ZeroDivisionError: integer division or modulo by zero

The reason for this is that the intermediate rechunk between the preprocessor steps create the following chunks after the regrid step / before the climate_statistics steps:

chunks before rechunk
((36, 13), (720,), (1440,))
chunks after rechunk
((0, 36, 13), (720,), (1440,))

The only non-default preprocessors used here are regrid and climate_statistics. The leading zero here then causes the error during the calculation of the time weights. The very funny thing here again is that this depends on the timerange. The error appears for 3 days, but not 10 days (see recipe)...Also, as reported by Tina, this does not appear in older Dask versions (I am using 2024.12.0 for this test).

@bouweandela do you think this is related this PR here or another bug in Dask. I would really like to merge this here.

Recipe:
recipe_era5_grib.yml.txt

@bouweandela
Copy link
Member

bouweandela commented Dec 6, 2024

do you think this is related this PR here or another bug in Dask

If this PR works fine with the older version of Dask, it's probably a Dask bug and you can just merge. Especially since we just call rechunk on a Dask array without any arguments, this looks very Dask internal. It would be really good to open issue(s) about our recent problems with Dask (I believe the size 0 chunks are the third problem you've found in the past month?) on the Dask repository to make sure that things get fixed and we can keep using recent versions of Dask.

@schlunma
Copy link
Contributor Author

schlunma commented Dec 6, 2024

It would be really good to open issue(s) about our recent problems with Dask (I believe the size 0 chunks are the third problem you've found in the past month?) on the Dask repository to make sure that things get fixed and we can keep using recent versions of Dask.

The big problem here really is that it's super hard to isolate this. During the last weeks I spent days trying to come up with a minimal example example of a Dask bug which does not use any external packages and couldn't do it. I really hope to finalize this at some point.

So if anyone has the resources to look into this one here, I would be super grateful.

@schlunma schlunma merged commit 65c7b28 into main Dec 6, 2024
7 checks passed
@schlunma schlunma deleted the read_era5_grib branch December 6, 2024 11:41
@schlunma
Copy link
Contributor Author

schlunma commented Jan 7, 2025

The size 0 chunk problem will be fixed in the next Dask release 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants