Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/public-v2: add NCEPLIBS umbrella modulefile and shell scripts for SRW App release #148

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Oct 19, 2020

The SRW App wants an easier way to set environment variables or load modules than loading individual NCEPLIBS modules etc. This was discussed at length in last week's SRW App build team meeting. It was decided to augment the NCEPLIBS with an umbrella modulefile (for systems that use modules) and shell scripts (bash and tcsh; for users on commodity systems without the module environment). This is similar to what was done for the MRW App release.

The approach taken here:

  • use the existing logic to deploy modulefiles and extend it with an umbrella module and two shell scripts
  • for simplicity, DEPLOY is by default ON and running make deploy will always overwrite existing files
  • also add CC, CXX, FC to the umbrella modulefile and the shell scripts; if this is not to be included, we can revert adding CXX to the cmake project languages - should it be removed?
  • extend lua2tcl.py so that the umbrella modulefile can also be converted
  • look for NetCDF and ESMF in the umbrella CMakeLists.txt and export environment variables NETCDF and ESMFMKFILE (required to build the ufs-weather-model standalone using build.sh or compile.sh)

@mkavulich
Copy link

@climbfuji Perhaps this is related to the weird extra file sed is creating? It's a slightly different problem but it was solved by adding two single quotes after -i https://stackoverflow.com/questions/7733922/sed-command-creates-randomly-named-files

@climbfuji
Copy link
Collaborator Author

@climbfuji Perhaps this is related to the weird extra file sed is creating? It's a slightly different problem but it was solved by adding two single quotes after -i https://stackoverflow.com/questions/7733922/sed-command-creates-randomly-named-files

Bingo! Thanks. Yes, adding '' after the -i for each of the sed commands did the trick.

@mkavulich
Copy link

Actually this may be a problem....It looks like there are differences in how sed is implemented on macos vs linux. On macos the above solution works, but when I try to do it on Cheyenne it returns an error sed: can't read : No such file or directory

There is some more information here...it seems as if there is no clean way to do this that works both on macos and linux: https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux

It may be that deleting the rogue temporary file is the best way to go about it, unless we want to make gnu-sed via Homebrew another prerequisite ;)

@climbfuji
Copy link
Collaborator Author

Actually this may be a problem....It looks like there are differences in how sed is implemented on macos vs linux. On macos the above solution works, but when I try to do it on Cheyenne it returns an error sed: can't read : No such file or directory

There is some more information here...it seems as if there is no clean way to do this that works both on macos and linux: https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux

It may be that deleting the rogue temporary file is the best way to go about it, unless we want to make gnu-sed via Homebrew another prerequisite ;)

Argh, that's what I feared when I read about the solution. Let me try to look for a cleaner way. These .lua-e files exist for every of the individual NCEPLIBS module file as well.

…s to avoid writing the NCEPLIBS .lua-e file (macOS only)
@climbfuji climbfuji force-pushed the umbrella_modulefile_and_shellscripts branch from 66375c6 to 5cd71f9 Compare October 19, 2020 20:34
@climbfuji
Copy link
Collaborator Author

Actually this may be a problem....It looks like there are differences in how sed is implemented on macos vs linux. On macos the above solution works, but when I try to do it on Cheyenne it returns an error sed: can't read : No such file or directory
There is some more information here...it seems as if there is no clean way to do this that works both on macos and linux: https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux
It may be that deleting the rogue temporary file is the best way to go about it, unless we want to make gnu-sed via Homebrew another prerequisite ;)

Argh, that's what I feared when I read about the solution. Let me try to look for a cleaner way. These .lua-e files exist for every of the individual NCEPLIBS module file as well.

@mkavulich try again please

@climbfuji
Copy link
Collaborator Author

We also need to decide what to do with the dependencies built by NCEPLIBS-external (or loaded a priori as modules / set via environment variables). Most importantly, we need to avoid confusion what users have to set and what not. As far as I am concerned, everything that gets built or set as part of NCEPLIBS-external and NCEPLIBS should be automatically included in the umbrella modulefile and the shell script. Anything that the user sets/loads beforehand should remain separate.

Thoughts?

@edwardhartnett
Copy link
Contributor

For NCEPLIBS and NCEPLIBS-external, what environment variables have to be set? Seems like only CMAKE_PREFIX_PATH should need to be set...

@edwardhartnett
Copy link
Contributor

@climbfuji is there an issue describing this work? If not, can you start one?

@edwardhartnett
Copy link
Contributor

Why the switch to version 2.0? What component library upgrades does this entail (if any)?

@climbfuji
Copy link
Collaborator Author

Why the switch to version 2.0? What component library upgrades does this entail (if any)?

UFS MRW Apps 1.0 and 1.1 where released with NCEPLIBS-v1.0.0 (internally ufs-v1.0.0) and NCEPLIBS-v1.1.0 (internally ufs-v1.1.0). The decision was made to ship the SRW App 1.0 with NCEPLIBS-v2.0.0 (internally ufs-v2.0.0).

The numbering of these release versions does not coincide with the internal number in the develop branch. For each major release of the UFS (same or different App), the public release version of NCEPLIBS will be bumped up if sufficient changes are made. v1.0.0 was an early version of the umbrella build (basically just after we started working on it), and this version here is quite different (pretty close to the development version at the moment).

@climbfuji
Copy link
Collaborator Author

@climbfuji is there an issue describing this work? If not, can you start one?

The issue is summarized in the first paragraph of the PR description.

@climbfuji
Copy link
Collaborator Author

For NCEPLIBS and NCEPLIBS-external, what environment variables have to be set? Seems like only CMAKE_PREFIX_PATH should need to be set...

Not all users will use NCEPLIBS-external to install the dependencies, although this is the only supported way for the public release. However, users can choose which dependencies to build (netCDF, libpng, jasper, esmf) and which to take from elsewhere (because they already have them preinstalled).

The easiest way - for the purpose of the release, not the develop branch - would be to have the NCEPLIBS umbrella CMakeLists.txt look for all required packages (for the ufs-weather-model, only netcdf, esmf, libpng; for UFS_UTILS and EMC_post a few more) and set the corresponding XYZ_ROOT environment variables in the umbrella module and shell scripts.

Another way would be to check which modules NCEPLIBS-external generated and add them to the umbrella module; for the shell scripts, you'd still need to export the XYZ_ROOT environment variables instead.

@kgerheiser
Copy link
Contributor

Thanks Dom

also add CC, CXX, FC to the umbrella modulefile and the shell scripts; if this is not to be included, we can revert adding CXX to the cmake project languages - should it be removed?

I think that's fine

not included yet: scan currently loaded modules and add those (either as module load or prereq) to the umbrella module file - should it be included?

Not sure about that. Seems like it could be error prone by adding random modules that you have loaded.

not included yet: search for known dependencies for the app to build (NetCDF, Jasper, ESMF, ...) in the umbrella CMakeLists.txt and add to the umbrella modulefile and shell scripts - should it be included?

You mean like find_package to find the location? This would be for things not built by NCEPLIBS-external, right? Might be worthwhile to add this.

@climbfuji climbfuji linked an issue Oct 20, 2020 that may be closed by this pull request
@climbfuji
Copy link
Collaborator Author

@mkavulich do you have a list of thirdparty dependencies that need to be found to build UFS_UTILS and EMC_post, beyond what the ufs-weather-model needs? For the model, we only need to look for libpng, netcdf, esmf.

@edwardhartnett
Copy link
Contributor

To answer one of my own questions, here's the kind of environment vars that are being set:

setenv BUFR_INC4     "${base}/include_4"
setenv BUFR_INC8     "${base}/include_8"
setenv BUFR_INCd     "${base}/include_d"
setenv BUFR_LIB4     "${base}/lib/libbufr_4.a"
setenv BUFR_LIB8     "${base}/lib/libbufr_8.a"
setenv BUFR_LIBd     "${base}/lib/libbufr_d.a"
setenv BUFR_INC4_DA  "${base}/include_4_DA"
setenv BUFR_INC8_DA  "${base}/include_8_DA"
setenv BUFR_INCd_DA  "${base}/include_d_DA"
setenv BUFR_LIB4_DA  "${base}/lib/libbufr_4_DA.a"
setenv BUFR_LIB8_DA  "${base}/lib/libbufr_8_DA.a"
setenv BUFR_LIBd_DA  "${base}/lib/libbufr_d_DA.a"

Thankfully we can get rid of all of this after this issue is complete everywhere: NOAA-EMC/NCEPLIBS-ip#226

@climbfuji
Copy link
Collaborator Author

@mkavulich I think this is ready for another round of testing. I'll test the shell scripts on my mac now. You can try it on Cheyenne if you like. Basically, you still need to load all modules that you loaded before building NCEPLIBS-external and NCEPLIBS:

module purge
module load ncarenv/1.3
module load intel/19.1.1
module load mpt/2.19
module load ncarcompilers/0.5.0
module load cmake/3.16.4

But you should no longer have to set CC, CXX, FC. The module load commands come down to

module use /glade/p/ral/jntp/GMTB/tools/NCEPLIBS-ufs-v2.0.0/intel-19.1.1/mpt-2.19/modules
module load NCEPLIBS/2.0.0

For your testing, replace the module use path as appropriate. I will roll out this updated version (and also update the doc/README_*.txt files in NCEPLIBS-external) after we tested this successfully on a range of platforms.

Note that I converted all modulefiles from lua to tcl, hence please pay close attention to module load errors. Some systems don't understand lua, but all understand tcl. Converting them means we don't need to maintain and run the workaround lua2tcl.py script.

@mkavulich
Copy link

@mkavulich do you have a list of thirdparty dependencies that need to be found to build UFS_UTILS and EMC_post, beyond what the ufs-weather-model needs? For the model, we only need to look for libpng, netcdf, esmf.

There should be no additional dependencies for EMC_post; we used to require Jasper but that is no longer needed. There don't appear to be any extra external dependencies for UFS_UTILS either, but that's just from reading the CMakeLists.txt; I'm not familiar enough with UFS_UTILS to say for sure.

UFS_UTILS does require some NCEPLIBS which the weather model does not need (landsfcutil and I think a few others) but I guess the plan will be to load all nceplibs?

@climbfuji
Copy link
Collaborator Author

@mkavulich do you have a list of thirdparty dependencies that need to be found to build UFS_UTILS and EMC_post, beyond what the ufs-weather-model needs? For the model, we only need to look for libpng, netcdf, esmf.

There should be no additional dependencies for EMC_post; we used to require Jasper but that is no longer needed. There don't appear to be any extra external dependencies for UFS_UTILS either, but that's just from reading the CMakeLists.txt; I'm not familiar enough with UFS_UTILS to say for sure.

UFS_UTILS does require some NCEPLIBS which the weather model does not need (landsfcutil and I think a few others) but I guess the plan will be to load all nceplibs?

Yes, all NCEPLIBS modules will be loaded. I am currently having an issue with the ufs-weather-model build expecting tthe NCEPLIBS environment variable to be set (even though it could find it without it, but that's a different story). I'll need to add some more logic to this PR.

@climbfuji
Copy link
Collaborator Author

Just pushed an update. With this, I can build the ufs-weather-model standalone using compile_cmake.sh on one of my macs. Haven't tried build.sh yet, and neither my other (clean) mac.

@climbfuji
Copy link
Collaborator Author

Just pushed an update. With this, I can build the ufs-weather-model standalone using compile_cmake.sh on one of my macs. Haven't tried build.sh yet, and neither my other (clean) mac.

It also builds with build.sh on my mac. Going to test gaea and my cleaner mac this afternoon.

@mkavulich
Copy link

@climbfuji I'm still not able to run my own end-to-end tests (including NCEPLIBS-external) because I am not able to check out hdf5...I opened an issue there with more info: NOAA-EMC/NCEPLIBS-external#79

If you would like me to run tests on any specific platforms where NCEPLIBS has already been built let me know.

@mkavulich
Copy link

Scratch that, whatever problems with the hdf5 repository have now been resolved. Would you like me to run tests on any specific platform? I have access to Cheyenne, Jet, Hera, and Orion.

@climbfuji
Copy link
Collaborator Author

climbfuji commented Oct 22, 2020 via email

@mkavulich
Copy link

Building on Cheyenne with intel and gnu, I'll let you know the results

deploy_modulefiles.sh.in Outdated Show resolved Hide resolved
@mkavulich
Copy link

@climbfuji The Cheyenne tests were successful, once I made the above-mentioned capitalization change the NCEPLIBS build successfully for intel and gnu, and created the expected files. The ufs-srweather-app built successfully with the modulefile for intel; I did not test the use of the source files but they were created successfully and looked reasonable. I have started a branch that will update the READMEs and issue a PR when appropriate. The ufs-srweather-app did not build for gnu but this is a known issue that also affected the old build that I haven't had time to look into yet.

I did notice that you now have to manually set the ESMFMKFILE path when running cmake for NCEPLIBS whereas before it was not necessary; I guess that can be handled in documentation?

@climbfuji
Copy link
Collaborator Author

@climbfuji The Cheyenne tests were successful, once I made the above-mentioned capitalization change the NCEPLIBS build successfully for intel and gnu, and created the expected files. The ufs-srweather-app built successfully with the modulefile for intel; I did not test the use of the source files but they were created successfully and looked reasonable. I have started a branch that will update the READMEs and issue a PR when appropriate. The ufs-srweather-app did not build for gnu but this is a known issue that also affected the old build that I haven't had time to look into yet.

Thanks for this one, will push the change right away.

I did notice that you now have to manually set the ESMFMKFILE path when running cmake for NCEPLIBS whereas before it was not necessary; I guess that can be handled in documentation?

Yes, I will add that export statement in each of the doc/README_*.txt files of NCEPLIBS-external. It's necessary to do that so that NCEPLIBS can set this environment variable in the umbrella module and the shell scripts.

Copy link

@mkavulich mkavulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@climbfuji I have also successfully run on Orion now, loading the NCEPLIBS modulefile sets the environment correctly. I don't have write privileges in this repository but consider this my approval 👍

@climbfuji
Copy link
Collaborator Author

@climbfuji I have also successfully run on Orion now, loading the NCEPLIBS modulefile sets the environment correctly. I don't have write privileges in this repository but consider this my approval 👍

That's great, thanks for lettting me know. Let's merge it and fix any issues later. Code freeze is around the corner. Associated PR for NCEPLIBS-external will require documentation updates and maybe an additional modulefile or two (jasper, jpeg) for gaea.

@climbfuji
Copy link
Collaborator Author

@kgerheiser @mark-a-potts do you want to take a look or should I merge without your reviews? This is for release/public-v2.

Copy link
Contributor

@kgerheiser kgerheiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you work

@climbfuji climbfuji merged commit f4482d4 into NOAA-EMC:release/public-v2 Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Requested changes in support of SRW App
4 participants