-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iasi debug fix #790
Iasi debug fix #790
Conversation
…his change allows the number of channels to be defined at run time.
…were caused by the radiance scaling factor being missing in the DBNet data. Corrected sending in profile counts into combine_radobs instead of total channel counts in most satellite data read routines. Not all of them were tested. Several no longer exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple minor suggestions.
@DavidHuber-NOAA An alternative would be to add another argument to combine_radobs specifically for ncounts and ncounts1, which would all be long integers from and to the various read routines. |
@wx20jjung I agree with you on the intent; that it was developed for 2D observations and so it is unlikely to every overflow with your changes. That said, I think I like the alternative approach you proposed just to be safe. |
…t of the thinning routine (itxmax).
As per a conversation with @DavidHuber-NOAA, I've separated the total channel counts read (nread) from the thinning box (itxmax). Now, if the channel counts read becomes larger than the integer size, it will only affect the write statements. I've re-run the ctests on hera. Test project /scratch1/NCEPDEV/jcsda/Jim.Jung/scrub/ctests/update/build 83% tests passed, 1 tests failed out of 6 Total Test time (real) = 1969.84 sec The hafs_4denvar_glbens test failed the time limit test. I restarted the test and it passed. ctest --rerun-failed --output-on-failure I will run the ctests on other machines when I get feedback from the rest of the reviewers. |
WCOSS2 (Dogwood) ctests Install
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a major improvement to me.
Thanks, Jim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, @wx20jjung. Just a few minor comments.
Co-authored-by: David Huber <[email protected]>
… make this routine consistent with other satellite data read routines. The other two files are formatting changes.
…o IASI_debug_fix
I've made the various changes, tested everything both in the current branch (IASI_debug_fix) and in my IASI-NG branch in a single cycle test. Both results are as expected. The various satellite parameters in the gsistat (counts, bias, std dev, etc) are identical with (iasi_debug_fix) and without (develop) these changes. I found no more '****' fields in any of the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wx20jjung for the updates. Question remains for src/gsi/read_seviri.f90
…vious changes in combine_radobs. Some minor changes remain, lile in read_cris, where now nrec = 999999 to make it consistent with other read routines. The changes associated in fixing the read_iasi failure in debug mode remain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wx20jjung, one question and one request for change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wx20jjung . I installed wx20jjung:IASI_debug_fix on Cactus this afternoon with num_profiles
removed from read_seviri.f90
in my working copy. All ctests passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good. Thanks @wx20jjung!
ctests on herculese completed as expected 100% tests passed, 0 tests failed out of 6 Total Test time (real) = 1741.16 sec Results from jet had 2 problems: The rrfs tests timed out and did not complete. My hera tests have been in the queue for the past 12 hours. I removed them. |
… initialized twice in read_gmi.f90, once before and after it was allocated. I removed the first instance.
WCOSS2 (Cactus) ctests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wx20jjung for working through my comments. Looks good.
Approve.
Description
When the GSI is built in debug mode, the code failed when the read_iasi routine called combine_radobs. In tracking down this problem several other minor problems were discovered and fixed.
Resolves #789
The first group of commits are some write format changes I missed in the output of the runtime directory. I also found where the maximum channel number was set to 3000. I changed this array to be dynamic and allocated for each instrument.
The second group of commits resolves 2 errors I found while trying to fix the debug issue. The first was an error some of the IASI temperature values were "NAN"s. I traced this back to some of the cscale values (a scaling factor for IASI radiances, in the BUFR file), were missing values. The missing cscale values are associated with the shortwave side of the water vapor region and the shortwave channels. None of these channels are currently used. These were only found in the direct broadcast data. I did not find an instance where the operational data had missing values. The second error was an integer overflow problem with the variable nread being passed into combine radobs. Nread is used in most satellite instrument reads but the main failure was from read_iasing (to be added later). Nread is a counter and is ultimately number_of_profiles * number_of_channels. This number exceeded the memory for an integer and was ultimately a negative number. In combine_radobs, nread is compared to the total number of elements of the thinning box or itxmax to the number of elements of the task thinning box. In this case, the task thinning box was negative. I added logic to the various read_* routines to pass the number of profiles kept on each task. This makes nread and itxmax consistent. This change caused the total counts to be different for some instruments. Here is an example from the gsistat file for IASI.
previous:
o-g 01 rad metop-b iasi 331026696 6148957 690460 0.24889E+06 0.24889E+06 0.36047 0.36047
new:
o-g 01 rad metop-b iasi 44632896 6148957 690460 0.24889E+06 0.24889E+06 0.36047 0.36047
The total counts after thinning and total counts used are identical along with the other statistics.
The actual failure of read_iasi in debug mode came down to putting an if() cycle in a different place. Read_iasi is now consistent with read_cris.
Type of change
How Has This Been Tested?
The main testing and cycling experiments were conducted on S4 at C192 resolution. After a single cycle spinup I setup a "control" and "experiment" and run both independently changing the GSIEXEC variable to point to the gsi master branch (control) and the IASI_debug_fix branch (experiment). The control and experiment were run through 4 cycle. Static tests were also conducted on Jet.
The ctests on Hera all passed:
[Jim.Jung@hfe10 build]$ ctest -j 6
Test project /scratch1/NCEPDEV/jcsda/Jim.Jung/save/ctests/update/build
Start 1: global_4denvar
Start 6: global_enkf
Start 2: rtma
Start 3: rrfs_3denvar_rdasens
Start 4: hafs_4denvar_glbens
Start 5: hafs_3denvar_hybens
1/6 Test #3: rrfs_3denvar_rdasens ............. Passed 492.94 sec
2/6 Test #2: rtma ............................. Passed 970.72 sec
3/6 Test #5: hafs_3denvar_hybens .............. Passed 1043.42 sec
4/6 Test #6: global_enkf ...................... Passed 1118.55 sec
5/6 Test #4: hafs_4denvar_glbens .............. Passed 1159.69 sec
6/6 Test #1: global_4denvar ................... Passed 1909.17 sec
100% tests passed, 0 tests failed out of 6
Total Test time (real) = 1909.21 sec
The rrfs ctest timed out on jet, all others passed.
Test project /lfs5/HFIP/hfv3gfs/Jim.Jung/noscrub/ctests/update/build
Start 1: global_4denvar
Start 2: rtma
Start 3: rrfs_3denvar_rdasens
Start 4: hafs_4denvar_glbens
Start 5: hafs_3denvar_hybens
Start 6: global_enkf
1/6 Test #6: global_enkf ...................... Passed 1779.50 sec
2/6 Test #2: rtma ............................. Passed 1938.22 sec
3/6 Test #1: global_4denvar ................... Passed 1991.97 sec
4/6 Test #5: hafs_3denvar_hybens .............. Passed 2000.38 sec
5/6 Test #4: hafs_4denvar_glbens .............. Passed 2310.42 sec
The rrfs test also failed on hercules
[jjung@hercules-login-3 build]$ ctest
Test project /work/noaa/nesdis-rdo1/jjung/noscrub/ctests/update/build
Start 1: global_4denvar
1/6 Test #1: global_4denvar ................... Passed 1741.09 sec
Start 2: rtma
2/6 Test #2: rtma ............................. Passed 965.23 sec
Start 3: rrfs_3denvar_rdasens
3/6 Test #3: rrfs_3denvar_rdasens .............***Failed 484.58 sec
Start 4: hafs_4denvar_glbens
4/6 Test #4: hafs_4denvar_glbens .............. Passed 1161.19 sec
Start 5: hafs_3denvar_hybens
5/6 Test #5: hafs_3denvar_hybens .............. Passed 1094.42 sec
Start 6: global_enkf
6/6 Test #6: global_enkf ...................... Passed 847.14 sec
83% tests passed, 1 tests failed out of 6
Total Test time (real) = 6293.65 sec
The following tests FAILED:
3 - rrfs_3denvar_rdasens (Failed)
The memory for rrfs_3denvar_rdasens_loproc_updat is 1106724 KBs. This has exceeded maximum allowable memory of 1098631 KBs, resulting in Failure memthresh of the regression test.
Checklist
Please add @ADCollard, @DavidHuber-NOAA and @InnocentSouopgui-NOAA as reviewers.