-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for native ERA5 data in GRIB format #2178
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2178 +/- ##
==========================================
+ Coverage 94.66% 94.85% +0.18%
==========================================
Files 251 251
Lines 14287 14371 +84
==========================================
+ Hits 13525 13631 +106
+ Misses 762 740 -22 ☔ View full report in Codecov by Sentry. |
This is ready from my side, but there's two issues that need to be resolved before I mark this ready for review:
I tested this thoroughly with the following recipe: recipe_000.yml.txt An example run is available on Levante here: Note that with the default dask scheduler, this recipe ran into a timeout after 8 hours with 67/76 tasks finished. With the following dask configuration, I could run the same recipe on the same node (regular Levante compute node with 256 GiB of memory) in 5:27 min (!!) 🚀. cluster:
type: distributed.LocalCluster
n_workers: 32
threads_per_worker: 4
memory_limit: 8 GiB @ESMValGroup/technical-lead-development-team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few typos and had a comment for futureproofing the automatic regridding in case we are adding more grib datasets (CAMS soon TM). Runs fine though and input looks reasonable!
Think we can merge this with 3 pull request reviews and all issues and dependencies solved?
Co-authored-by: Bettina Gier <[email protected]>
While testing this again, @bettina-gier and myself got the following error Full traceback:
The reason for this is that the intermediate
The only non-default preprocessors used here are @bouweandela do you think this is related this PR here or another bug in Dask. I would really like to merge this here. Recipe: |
If this PR works fine with the older version of Dask, it's probably a Dask bug and you can just merge. Especially since we just call |
The big problem here really is that it's super hard to isolate this. During the last weeks I spent days trying to come up with a minimal example example of a Dask bug which does not use any external packages and couldn't do it. I really hope to finalize this at some point. So if anyone has the resources to look into this one here, I would be super grateful. |
The size 0 chunk problem will be fixed in the next Dask release 🎉 |
Description
This PR allows ESMValCore to process native ERA5 data in GRIB format, which is for example available on Levante in the
/pool/data/ERA5
directory.Reading the data
The following settings are necessary in the user configuration file:
I added an extra facets file which includes reasonable default for all supported variables. You can check it out here.
Thus, reading this data is as easy as
Regridding
Native ERA5 data in GRIB format is on a reduced Gaussian grid (i.e., an unstructured grid). Thus, in 99% of the use cases, it is necessary to regrid this data, especially since no cell areas are available for the data (thus, we cannot even calculate global/regional statistics over the native data). This is done automatically by the CMORizer (as recommended by the ECMWF), but can be disabled in the recipe:
This PR depends on the following other PRs:
eccodes-python
PyPI package with neweccodes
in core requirements SciTools/iris-grib#357Closes #1991
Closes ESMValGroup/ESMValTool#3238
Link to documentation: https://esmvaltool--2178.org.readthedocs.build/projects/ESMValCore/en/2178/quickstart/find_data.html#supported-native-reanalysis-observational-datasets
Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: