Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baseline check is not working as expected on Hercules #2245

Closed
uturuncoglu opened this issue Apr 20, 2024 · 8 comments
Closed

baseline check is not working as expected on Hercules #2245

uturuncoglu opened this issue Apr 20, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@uturuncoglu
Copy link
Collaborator

Description

I am trying to compare a set of netcdf files for the regression test defined in ufs-coastal. This is CDEPS data atmosphere coupled ROMS configuration and produces three netcdf files. The output of the regression test is something like following on Hercules,

baseline dir = /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240417/coastal_irene_atm2roms_intel
working dir  = /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_2701396/coastal_irene_atm2roms_intel
Checking test coastal_irene_atm2roms_intel results ....
 Comparing irene_avg.nc .....USING NCCMP......NOT IDENTICAL
 Comparing irene_his.nc .....USING NCCMP......NOT IDENTICAL
 Comparing irene_rst.nc .....USING NCCMP......NOT IDENTICAL

 0: The total amount of wall time                        = 246.928419
 0: The maximum resident set size (KB)                   = 268644

Test coastal_irene_atm2roms_intel FAIL Tries: 2

It indicates that the test is failed in the step of baseline configuration. Actually, if I run the nccmp -d -S -q -f -g -B --Attribute=checksum --warn=format /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240417/coastal_irene_atm2roms_intel/irene_his.nc /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_2701396/coastal_irene_atm2roms_intel/irene_his.nc > log 2>&1 && d=$? || d=$?; echo $d command manually. The log file is empty but $d has value of 1 and the regression testing thinks that the test is failed. I also compared the files with NCAR's cprnc tool and it seems that the files are identical.

SUMMARY of cprnc:
 A total number of    307 fields were compared
          of which      0 had non-zero differences
               and      0 had differences in fill patterns
               and      0 had different dimension sizes
               and      0 had different data types
 A total number of      0 fields could not be analyzed
 A total number of      0 time-varying fields on file 1 were not found on file 2.
 A total number of      0 time-constant fields on file 1 were not found on file 2.
 A total number of      0 time-varying fields on file 2 were not found on file 1.
 A total number of      0 time-constant fields on file 2 were not found on file 1.
  diff_test: the two files seem to be IDENTICAL 

So, I am not sure why but rt_utils.sh thinks that the files are not identical. Any suggestion? Is this a bug? Since the script is used by multiple tests and seems robust but I am not sure. There could be still issue with the RT baseline check step.

I also test this on Frontera and got similar results (oceanmodeling/roms#3) but of course that is not a officially supported Teir 1 platform and also with little bit old version of model (maybe not using nccmp).

To Reproduce:

This can be reproduced on Hercules using ufs-coastal.

  1. checkout ufs-coastal: git clone -b feature/coastal_app --recursive https://github.com/oceanmodeling/ufs-coastal.git
  2. cd ufs-coastal/tests
  3. run RTs: ./rt.sh -l rt_coastal.conf -a nems -e since there is a bug in rt.sh (rt.sh is not working properly when -l and -n used together. #2244) there is n o way to run single test like coastal_irene_atm2roms but rt_coastal.conf can be edited to keep only coastal_irene_atm2roms.

Additional context

None

Output

None

@uturuncoglu
Copy link
Collaborator Author

Let me test this on Derecho. I'll update you about it.

@uturuncoglu
Copy link
Collaborator Author

Hercules, need to check the permissions for the baseline files.

@uturuncoglu
Copy link
Collaborator Author

@DusanJovic-NOAA I double check and I think that permissions are fine. Can you try to read the files in /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240126/coastal_irene_atm2roms_intel or /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240417/coastal_irene_atm2roms_intel and let me know. If you could not which level you could see.

@uturuncoglu
Copy link
Collaborator Author

Might be also related following closed issue - #2015

@DusanJovic-NOAA
Copy link
Collaborator

DusanJovic-NOAA commented Apr 22, 2024

I see the differences in the compiler_flags global attribute:

$ nccmp -g /work2/noaa/stmp/tufuk/stmp/tufuk/FV3_RT/rt_2770508/coastal_irene_atm2roms_intel/irene_avg.nc /work2/noaa/nems/tufuk/RT/NEMSfv3gfs/develop-20240126/coastal_irene_atm2roms_intel/irene_avg.nc
DIFFER : LENGTHS OF GLOBAL ATTRIBUTE : compiler_flags : 223 <> 193 : VALUES :  -g -traceback -fpp -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -sox -align array64byte -qno-opt-dynamic-align -diag-disable 5462 -diag-disable 7712 -real-size 64 -fp-model precise -ip -O3 -traceback -check uninit <>  -g -traceback -fpp -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -nowarn -sox -align array64byte -qno-opt-dynamic-align -real-size 64 -fp-model precise -ip -O3 -traceback -check uninit

If I'm looking at correct output files.

@uturuncoglu
Copy link
Collaborator Author

uturuncoglu commented Apr 22, 2024

@DusanJovic-NOAA Thanks for checking. That is really helpful. I am not sure why I am not seeing this in the nccmp output. If this is the case, since these are ROMS global attributes and related with the compile flags, the baseline needs to be created again even if the data itself are fine. I think there is also way to check just data not the attributes but I am not sure that is the way that we need to go. Let me check the create baseline and check again on Hercules. Thanks again for your help.

@DusanJovic-NOAA
Copy link
Collaborator

You are not seeing the differences because of -q (quiet) flag. Without -q the stdout will be huge in case the files are actually different so we rely on the error code to determine if the files are actually different.

@uturuncoglu
Copy link
Collaborator Author

@DusanJovic-NOAA Thanks. It is good to know. I added export CMP_DATAONLY=true to the test file and run again and it is passing now. I think I could close this issue. Thanks again for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

2 participants