-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test new EPIC hpc-stack installs on R&Ds #1311
Comments
@ulmononian Where do I go to request changes to an EPIC hpc-stack install? I found that the Example on Hera:
What it needs to be (similar to
|
sorry about that, @KateFriedman-NOAA. i will work with @natalie-perlin to address this and let you know once the directories are re-structured. |
Thanks @ulmononian @natalie-perlin! For more help, here is the hpc-stack intel 2022 crtm/2.4.0 fix folder on Hera for comparison: /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-2022.1.2/crtm/2.4.0/fix/ |
@KateFriedman-NOAA the fix directory structure for the intel-2022 installs of crtm/2.4.0 on hera/orion/jet has been updated. please let us know if you have any issues! |
Thanks @ulmononian @natalie-perlin for the quick update! I will retry my jobs on Hera and Orion that use the EPIC hpc-stack installs and crtm. I will report if I have any issues. |
@ulmononian I am getting the following error using the crtm fix files after the structure change:
I'm guessing the wrong Endian versions are in the place. If I compare the hpc-stack copy of the crtm fix file to the copy in the EPIC stack install they differ:
I can ping someone from the GSI team to confirm which Endian these files should be, if needed. Thanks! |
@KateFriedman-NOAA ah, my apologies there! i moved all of the Little_Endian files (as well as .nc/.nc4 files from the netcdf and netCDF subdirs) into /fix (and removed the Big_Endian files). my understanding was that hera was little endian byte order, but i must have been wrong? |
Codes in g-w directory
|
Oh, the application converts the code to big endian... (-fconvert=big-endian) |
@RussTreadon-NOAA ahh, okay! so we need the files from the |
@RussTreadon-NOAA is this an appropriate source for the further, @KateFriedman-NOAA noted the following:
|
@ulmononian updated the fix files to be the Confirmation that these crtm files are the correct versions is needed. As @ulmononian noted, one file differs when compared to the existing non-EPIC hpc-stack set and some extra files are in the EPIC install: One diff:
Full diff showing file that differs and extra files in EPIC copy:
|
@ulmononian , you ask excellent questions to which I do not have good answers. From whom did the library team obtain |
the source code comes from https://github.com/NOAA-EMC/crtm (in this case, it was: https://github.com/NOAA-EMC/crtm/tree/release/REL-2.4.0_emc). @Hang-Lei-NOAA can you comment on the file number discrepancy and diff results @KateFriedman-NOAA mentioned above? just want to confirm that ftp.ssec.wisc.edu/pub/s4/CRTM/fix_REL-2.4.0_emc.tgz is a valid source for the |
@cameron Book - NOAA Affiliate ***@***.***>
ftp.ssec.wisc.edu/pub/s4/CRTM/fix_REL-2.4.0_emc.tgz is a valid source for
the crtm/2.4.0 fix files
The source code comes from https://github.com/NOAA-EMC/crtm is a fork of
jcsda repo. I just created a pull request to update it. Someone who manage
the repo should either do sync or process the pull request.
EMC still maintained installation is not the one in the email, the
correct updated one is:
/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16/intel-
2022.1.2/crtm/2.4.0/fix/
…On Thu, Mar 16, 2023 at 1:20 PM Cameron Book ***@***.***> wrote:
@ulmononian <https://github.com/ulmononian> , you ask excellent questions
to which I do not have good answers. From whom did the library team obtain
crtm/2.4.0? This person or group may be able to answer your questions.
the source code comes from https://github.com/NOAA-EMC/crtm (in this
case, it was: https://github.com/NOAA-EMC/crtm/tree/release/REL-2.4.0_emc
).
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> can you comment on the
file number discrepancy and diff results @KateFriedman-NOAA
<https://github.com/KateFriedman-NOAA> mentioned above? just want to
confirm that ftp.ssec.wisc.edu/pub/s4/CRTM/fix_REL-2.4.0_emc.tgz is a
valid source for the crtm/2.4.0 fix files and that differences between
/scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack/intel-2022.1.2/crtm/2.4.0/fix/
and /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs
/intel-2022.1.2/intel-2022.1.2/crtm/2.4.0 are due to the updated to the
fix files (as you mentioned in hpc-stack #517
<NOAA-EMC/hpc-stack#517>).
—
Reply to this email directly, view it on GitHub
<#1311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGSYH627IETBJNUIITW4NDW3ANCNFSM6AAAAAAU2UYAOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Hang-Lei-NOAA That's indeed the copy we use for the GFSv16 system but for GFSv17+ developmental system we currently use the hpc-stack here on Hera: A diff between
|
@KateFriedman-NOAA thanks for updating the diff logs. i get the same result. @Hang-Lei-NOAA is the crtm/2.4.0 fix tarball updated regularly? i just retrieved the tarball today to double-check. |
Let me further check with GSI to see which is the correct installation.
We have emc
installation: /scratch2/NCEPDEV/nwprod/hpc-stack/libs/hpc-stack-gfsv16
(The other emc installation was given up, epic has continue that one on
their disk)
epic
installation: /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
…On Thu, Mar 16, 2023 at 1:48 PM Cameron Book ***@***.***> wrote:
@KateFriedman-NOAA <https://github.com/KateFriedman-NOAA> thanks for
updating the diff logs. i get the same result.
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> is the crtm/2.4.0
tarball updated regularly? i just retrieved the tarball today to
double-check.
—
Reply to this email directly, view it on GitHub
<#1311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGWLXAK5JR462CDDPLW4NG6VANCNFSM6AAAAAAU2UYAOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Other than checking that the EPIC hpc-stack crtm fix files are correct...my testing using EPIC hpc-stack on Hera and Orion is going well: Cycled ATM-only testing:
Coupled testing:
Changes made for testing EPIC hpc-stack on Hera/OrionModule use paths:
Components already using EPIC hpc-stack (no changes made):
Components left as is using intel 2018 at buildtime (but run with intel 2022):
What components were changed to build/run with EPIC hpc-stack (note: intel 2022):
Workflow modulefiles changes:
Only slight modifications were needed to build GFS-UTILS and UFS_UTILS with EPIC hpc-stack intel 2022: GFS-UTILS:
UFS_UTILS (changes in
|
@GeorgeGayno-NOAA See comment above...I'm testing the new EPIC hpc-stack installs on Hera/Orion within the full GFS/global-workflow system. I updated our current hash of UFS-UTILS to build with the EPIC hpc-stack intel 2022 modules/libraries. Not sure if you already have plans to move to use EPIC hpc-stack everywhere. I know @DavidHuber-NOAA introduced Jet support in UFS-UTILS which is using the EPIC hpc-stack install there: ufs-community/UFS_UTILS#771 |
@kate Friedman - NOAA Federal ***@***.***> That is very good.
Did you checkout with the latest GSI, which requires the newer fix files.
…On Thu, Mar 16, 2023 at 4:04 PM Kate Friedman ***@***.***> wrote:
@GeorgeGayno-NOAA <https://github.com/GeorgeGayno-NOAA> See comment
above...I'm testing the new EPIC hpc-stack installs on Hera/Orion within
the full GFS/global-workflow system. I updated our current hash of
UFS-UTILS to build with the EPIC hpc-stack intel 2022 modules/libraries.
Not sure if you already have plans to move to use EPIC hpc-stack
everywhere. I know @DavidHuber-NOAA <https://github.com/DavidHuber-NOAA>
introduced Jet support in UFS-UTILS which is using the EPIC hpc-stack
install there: ufs-community/UFS_UTILS#771
<ufs-community/UFS_UTILS#771>
—
Reply to this email directly, view it on GitHub
<#1311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFGDEISY3MBGIQJXA2LW4NW3LANCNFSM6AAAAAAU2UYAOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@KateFriedman-NOAA @Hang-Lei-NOAA: so i just manually (and carefully) moved all the files contained in crtm/2.4.0/fix tarball sub-directories containing Big_Endian, netcdf, and netCDF files into the top of /fix. the resultant file number was 1705. this matches the result of two line script i had implemented to do the same thing, but a script i became incredulous toward due the discrepancy between the nwprod stack crtm fix file number and the newly reformatted epic stack crtm fix file number, as well as the diff between as the current hera fix file structure worked for @KateFriedman-NOAA's recent test, until someone at gsi or the crtm level confirms otherwise, i will move forward with this re-structure on the other machines. |
further, regarding missing file |
@ulmononian @Hang-Lei-NOAA @RussTreadon-NOAA @KateFriedman-NOAA I think the differences are OK. The epic directory seems to contain all the available coefficient files, whereas the nwprod contains just the coefficients we use or expect to use. The difference in the AMSU-A Metop-C coeffs was, I believe, due to an early version missing the antenna correction. @emilyhcliu can confirm. @Hang-Lei-NOAA recently got the latest version (new instruments added) from ftp.ssec.wisc.edu/pub/s4/CRTM/fix_REL-2.4.0_emc.tgz |
@ADCollard thank you very much for your input here. this is reassuring. @RussTreadon-NOAA to confirm: since |
@andrew Collard - NOAA Federal ***@***.***> Thanks for
confirming.
…On Fri, Mar 17, 2023 at 9:43 AM Andrew Collard ***@***.***> wrote:
@ulmononian <https://github.com/ulmononian> @Hang-Lei-NOAA
<https://github.com/Hang-Lei-NOAA> @RussTreadon-NOAA
<https://github.com/RussTreadon-NOAA> @KateFriedman-NOAA
<https://github.com/KateFriedman-NOAA> I think the differences are OK.
The epic directory seems to contain all the available coefficient files,
whereas the nwprod contains just the coefficients we use or expect to use.
The difference in the AMSU-A Metop-C coeffs was, I believe, due to an early
version missing the antenna correction. @emilyhcliu
<https://github.com/emilyhcliu> can confirm.
@Hang-Lei-NOAA <https://github.com/Hang-Lei-NOAA> recently got the latest
version (new instruments added) from
ftp.ssec.wisc.edu/pub/s4/CRTM/fix_REL-2.4.0_emc.tgz
—
Reply to this email directly, view it on GitHub
<#1311 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFDSAES4XX7GDJOMPSTW4RTBFANCNFSM6AAAAAAU2UYAOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Status summary as of May 5th 2023Have global-workflow updates in fork branch: https://github.com/KateFriedman-NOAA/global-workflow/tree/feature/epic-stack Have opened issues in gfs-utils, GSI-utils, and GSI-Monitor to make updates to EPIC-maintained hpc-stacks. See issues listed in main issue comment. Created fork branches for each repo; see changes to Previous tests with EPIC hpc-stack and updated components worked for many cycles on Orion. After updating to fork branches for components and a couple module versions (updates in EPIC stack installs on Hera/Orion) the analcalc jobs are failing with an IOSTAT error reading fort.41. Other jobs working still. Can see tests on Hera, Orion, and WCOSS2 here: Orion:
Prior Orion test EXPDIR were cycepicstack96 and cycepicstack96b (reran analcalc in those tests and they now fail). Hera:
Saved Hera logs in Test on WCOSS2 using same branches works without issue:
Since test on WCOSS2-Cactus will age-off soon, have copied COMROT logs into |
ufs-community/ufs-weather-model#1621 (comment) Cheyenne intel-2022.1: /glade/work/epicufsrt/contrib/hpc-stack/intel2022.1_ncdf492/modulefiles/stack @KateFriedman-NOAA - looking into the issues you've reported! |
Appreciate that @natalie-perlin ! Unsure of the source of the error in the analcalc jobs but since they work on WCOSS2 (non EPIC-stacks) that makes me consider something in the EPIC stacks but my tests on Hera and Orion worked previously without this error. I can cat the generated fort.41 namelist file from the saved RUNDIR without error so not sure why it has issues at runtime. Let me know if I can try anything or provide more details, thanks! |
@KateFriedman-NOAA I checked your Hera log. It does not look like you are using the ufs_utils branch that points to the EPIC stack. Do you want to use that branch? |
That would be great @GeorgeGayno-NOAA ...can you point me to your branch? Thanks! |
https://github.com/GeorgeGayno-NOAA/UFS_UTILS/tree/feature/epic_stack |
ip/4.0.0 is available in EPIC-maintained stacks, in addition to v3.3.3 that is a default.
The v8.3.0b09 is the official version; 8.4.1b07 was installed as a trial. |
@KateFriedman-NOAA Could not find a location where the IOSTAT error reading fort.41 appears. Is it in another upper-level log? |
@natalie-perlin This one: Jump to bottom:
I didn't let the gdasanalcalc job run since I could see the gfsanalcalc job failed (two analcalc jobs per cycle). Rundir for this job: Thanks! |
So it cannot read the namelist file fort.41, which is certainly present in The executable |
@natalie-perlin Sorry for the delayed response, was on leave. Another workflow developer just ran into this same issue while testing a non-EPIC stack (the hpc-stack installs currently used in our |
@WalterKolczynski-NOAA will provide details on adding python environment to EPIC stacks on the R&Ds to support global-workflow and METplus jobs. |
TO-DO: the versions in the incoming version files (PR #1644) will need to be updated to intel 2022 for this work. |
Work status:Changes made to all components, see related issues listed in main issue comment. Work to do:
Results from most recent testing:
Orion:
Hera:
|
ESMF 8.4.1 is working on Hera. I would think it should work on Orion as well. Instead, I get errors at the link step. |
@GeorgeGayno-NOAA - Another set of stacks contain ESMF v8.4.2, installed in EPIC locations on the Tier 1 systems, along with hdf5/1.14.0, netcdf/4.9.2, pio/2.5.10, mapl/2.35.2. The stack locations are listed here: ufs-community/ufs-weather-model#1621 (comment) |
An update on the software stacks updated for ufs-community/ufs-weather-model#1772 (comment) |
Working is progressing with moving all components to spack-stack (issue #1868), which will supersede this effort. Stopping work on this and closing issue. |
Description
As described in ufs-community/ufs-weather-model#1465, there are new hpc-stack installs provided by EPIC on Hera and Orion. Test out these new stack installs on Hera/Orion within the
develop
branch. See the ufs-weather-model issue for paths on Hera and Orion.The incoming Jet port changes (PR #1301) are already using the EPIC hpc-stack install on that platform.
UFS_UTILS issue: ufs-community/UFS_UTILS#789
UPP issue: NOAA-EMC/UPP#660
Acceptance Criteria (Definition of Done)
The system works when loading and using these new EPIC hpc-stack installs on Hera and Orion.
External repo updates
The text was updated successfully, but these errors were encountered: