-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade the GSI to Spack-Stack version 1.6.0 #684
Upgrade the GSI to Spack-Stack version 1.6.0 #684
Conversation
Reran regression tests on Hera. All tests pass except the following:
|
@ADCollard @RussTreadon-NOAA I am seeing copious warning messages from the CRTM v2.4.0.1 (~60,000 lines of |
@DavidHuber-NOAA Sorry just saw this. I will check out your branch and run it myself. |
I ran Comparison of the amsua coefficients in the runtime Hera
Cactus
The Hera The crtm/2.4.0.1 fix directory on Cactus has five files.
whereas on Hera we have four files
The Cactus |
@RussTreadon-NOAA Thank you for the detailed investigation. I will report this to the spack-stack team. |
@DavidHuber-NOAA @RussTreadon-NOAA I also did the same test and noted the difference between cactus and hera. I also checked the coefficient files in the github and tarred versions of the CRTM fix files. These are:
and
In both cases the metop-c coefficient files are small (lacking the antenna correction information). So the WCOSS2 coefficient files appear inconsistent with the official release, but are in fact the correct ones to use. @Hang-Lei-NOAA worked with NCO to install the coefficient files on the WCOSS machines, so maybe he can explain the difference. My guess is that as this was supposed to only be an incremental change (only six files needed to be added) only these files were updated on WCOSS2 (which is definitely the safest way to manage this change). |
can this PR support gaea C5? |
@jswhit First, we will need to resolve this CRTM fix file issue. If that can be resolved, then I'm fine with updating the Gaea module to point to the C5/F5 stack -- I think it was just installed on F5 yesterday. Would you be willing to run the regression tests there? |
just wanted to leave a note here that several versions of spack-stack have been installed in a new location to accommodate the decommissioning of c4 and the transition to the f5 filesystem. the available versions and their locations are as follows:
there is a PR in the ufs-wm to update gaea to use |
The CRTM-fix file was fixed on Orion and regression tests were run. All tests passed except RTMA, which encountered a non-fatal time exceedance failure:
When the fix files are fixed on Hera, Hercules, and Jet, I will run RTs on each machine then mark this ready for review. I will also update the Gaea modulefiles to point to those suggested by @ulmononian so they can be tested by @jswhit. |
@DavidHuber-NOAA: I cloned |
@RussTreadon-NOAA Apologies, I had pulled develop in but not pushed it back to GitHub. It has been now. Thanks. |
@jswhit I have given a stab at updating the module file and regression test variables and parameters for Gaea-C5. I verified that the GSI builds. The CRTM-fix files have also been fixed on Gaea. Would you mind running the RTs to verify everything is OK? |
@RussTreadon-NOAA @edwardhartnett @TingLei-NOAA @climbfuji I know now what is causing Hercules to fail with netCDF errors. The errors only appear with the flag That said, I believe this still indicates an issue with the way the regional GSI is implementing its parallel I/O as this flag allows Intel MPI to distribute I/O across a parallel filesystem. If it is disabled, then native support for parallel I/O is disabled. My next question is, should I upgrade to v1.6.0 and add |
Excellent detective work @DavidHuber-NOAA! I agree that we need to re-examine and likely refactor netcdf parallel i/o. You opened GSI issue #694 for this purpose. Let's capture your findings in this issue. @TingLei-NOAA, issue #694 has been assigned you to you. I also added @ShunLiu-NOAA and @hu5970 to this issue. We need to resolve this problem. Since ctests pass on Hercules from a |
@DavidHuber-NOAA That is great! Yes "export I_MPI_EXTRA_FILESYSTEM=0" makes my previously failed test succeed now. I have two questions, first , will this make those previously failed hdf5 tests pass now? |
@TingLei-NOAA I'll respond to this in #694. |
Tests were successful on Hercules with the exception of |
@DavidHuber-NOAA , I found similar behavior on Hercules while running ctests for PR #695. PR #695 builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Thank you @DavidHuber-NOAA for persistently working through various problems.
@DavidHuber-NOAA , we need to peer reviews for this PR. I signed off as the Handling Reviewer. |
@TingLei-NOAA , since you have already commented on this PR can you serve as a peer reviewer? |
@DavidHuber-NOAA , since this is a |
@RussTreadon-NOAA this looks good to me from the spack-stack perspective. If it's been tested on the systems in question then those configs look good. |
Thank you @AlexanderRichert-NOAA for your comment. I added you as a reviewer so we can formally capture your input as a reviewer. Your review will help move this PR forward. |
modulefiles/gsi_hera.gnu.lua
Outdated
local stack_gnu_ver=os.getenv("stack_gnu_ver") or "9.2.0" | ||
local stack_openmpi_ver=os.getenv("stack_openmpi_ver") or "4.1.5" | ||
local cmake_ver=os.getenv("cmake_ver") or "3.23.1" | ||
local prod_util_ver=os.getenv("prod_util_ver") or "1.2.2" | ||
local prod_util_ver=os.getenv("prod_util_ver") or "2.1.1" | ||
local openblas_ver=os.getenv("openblas_ver") or "0.3.19" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe openblas version should be 0.3.24, @DavidHuber-NOAA can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexanderRichert-NOAA You are correct, it should be 0.3.24. I updated it and ran a test build. Thanks!
Hi Russ
I think I am not qualified to review this pr since I know little about
spack-stack (yet).
Ting
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Feb 13, 2024 at 4:47 PM RussTreadon-NOAA ***@***.***> wrote:
@TingLei-NOAA <https://github.com/TingLei-NOAA> , since you have already
commented on this PR can you serve as a peer reviewer?
—
Reply to this email directly, view it on GitHub
<#684 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7CTXZ2Q4S524G4W3QDYTPNOTAVCNFSM6AAAAABCAG5ZM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSGY3TSMRYHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@TingLei-NOAA , like you, I am not a spack-stack expert. We are both GSI, and now JEDI, developers. Can you review this PR as one who is knowledgeable about the GSI? @AlexanderRichert-NOAA is reviewing this PR as one knowledgeable about spack-stack. |
Russ
Sure if you think my knowlege of gsi would allow me help with this pr as a
reviewer I will be happy to do so
Ting
…______________________________
Ting Lei
Physical Scientist, Contractor with Lynker in support of
EMC/NCEP/NWS/NOAA
5830 University Research Ct., Cubicle 2765
College Park, MD 20740
***@***.***
301-683-3624
On Tue, Feb 13, 2024 at 7:49 PM RussTreadon-NOAA ***@***.***> wrote:
@TingLei-NOAA <https://github.com/TingLei-NOAA> , like you I am not a
spack-stack expert. We are both GSI, and now JEDI, developers. Can you
review this PR as one who is knowledgeable about the GSI?
@AlexanderRichert-NOAA <https://github.com/AlexanderRichert-NOAA> is
reviewing this PR as one knowledgeable about spack-stack.
—
Reply to this email directly, view it on GitHub
<#684 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APEFS7DN32IERIERNSFZ4I3YTQCYZAVCNFSM6AAAAABCAG5ZM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSHEYTOOJYHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spack-stack paths and versions look good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to @DavidHuber-NOAA for revealing the issues related to fv3reg GSI IO and demonstrating them are not results from the current PR with helps from other colleagues. Also Dave's work in this PR helps a lot in identifying the culprit in the parallel IO issue on hercules by showing the impacts of I_MPI_EXTRA_FILESYSTEM. As a developer for GSI, albeit not an expert in Spack-stack, I find this PR to be good and approve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve.
DUE DATE for merger of this PR into
develop
is 2/29/2024 (six weeks after PR creation).Description
This upgrades the spack-stack version to 1.6.0 which also upgrades
netCDF-Fortran 4.6.0 -> 4.6.1
sp 2.3.3 -> 2.5.0
CRTM 2.4.0 -> 2.4.0.1
prod_util 1.2.2 -> 2.1.1
Depends on #672
Resolves #674
Type of change
How Has This Been Tested?
Built on Hera and Orion. Regression tests performed on Hera. Four tests currently crash (those running CRTM) due to an issue with the way HIRS is versioned in the CRTM. This will be retested once #672 is merged.
Checklist