-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
g-w jobs fail on Hercules with NetCDF: HDF error #2489
Comments
@RussTreadon-NOAA I think we saw something similar in the UFSWM here: ufs-community/ufs-weather-model#2015 and the TL;DR was to add Can you give that a try to see if you still see that issue? |
While this seems like a different issue, workflow is not yet supported on Hercules due to a Lustre issue with |
@BrianCurtis-NOAA , thank you for sharing your insight.
I replaced the first line above with
The 202112200 18Z gdasfcst and enkfgdasfcst_mem002 still aborted with
In contrast, enkfgdasfcst_mem001 successfully ran to completion. As a follow on test I removed
while retaining
A rerun of gdasfcst still aborted with the The seemingly random nature of this behavior is disturbing.
|
Thank you @WalterKolczynski-NOAA for letting me know that g-w does not support Hercules. It is unfortunate that we can not reliably run cycled global parallels on Hercules. Have we elevated the |
The |
OK, we can keep this issue open for awareness. Hercules is not, at present, a viable option for running global parallels. |
Ran C96C48_ufs_hybatmDA (JEDI ATM) on Hercules. All jobs in 20240223/18 half cycle and 20240224/00 full cycle ran. Full cycle runs gdas, enkfgdas, and gfs. JEDI ATM currently runs with Ran C96C48_hybatmDA (GSI ATM) on Hercules. All jobs in 20211220/18 half cycle ran. All gdas and enkfgdas jobs in 20211221/00 and 06 full cycles ran. The 20211221/00 gfs fcst aborted upon
Given this, rewind and reboot 20240224/00 gfsfcst from JEDI ATM. The reboot failed just like GSI ATM gfsfcst. The gfsfcst failure isn't the NetCDF: HDF error reported in this issue. The gfsfcst failure above is the same failure reported in issue #2551. |
@RussTreadon-NOAA Is this still an issue. Forecast tests on hercules indicate that the model is running cleanly. |
We may close this issue. It seems failures were related to not removing the run directory before re-running tests. |
What is wrong?
ufs_model.x
aborts on Hercules in gdasfcst, gfsfcst, and enkfgdasfcst_mem* withgetsigensmeanp_smooth.x
aborts on Hercules in enkfgdasecen000 withWhat should have happened?
ufs_model.x
andgetsigensmeanp_smooth.x
should run to completion on HerculesWhat machines are impacted?
Hercules
Steps to reproduce
NetCDF: HDF error
Additional information
Also set up C96C48_ufs_hybatmDA on Hercules. gdasfcst and enkfgdasfcst successfully ran to completion for the first half-cycle. The gdasfcst, gfsfcst, and enkfgdasecen000 failed on the first full cycle. No changes to the executables between the first half cycle and first full cycle.
Do you have a proposed solution?
No response
The text was updated successfully, but these errors were encountered: