-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce testing data #671
Comments
The best path forward is to determine which subdirectories/files are actually used and store only those. The difficulty is I'm not sure how to go about determining that. I suppose a good starting point would be removing all files with year numbers greater than anything used in the test |
Yes, for sure. I can help with the ocean and sea-ice output. But my suggestion would be to start with a mostly empty new location on Chrysalis somewhere and to add output files (i.e. all files within the year range you're testing with a given prefix) only when analysis breaks without it. This way, you end up with a minimum set. |
If you had to choose between v2 and v3 data, it seems like you should pick v3 data. |
Below is what I am pretty confident you need for MPAS-Ocean and -Seaice for the v3 data. The v2 data should be similar. I would only copy these, preserving the directory structure, of course:
|
Thanks so much @xylar. I think I actually have a decent minimal test data set now! I'll transfer that to Compy for testing. It's 859G, which is a very welcome reduction from 24T! Script to generate v3 test data: # 2025-02-04
version="v3" # Options: v3, v2
if [ "${version}" == "v3" ]; then
case_name="v3.LR.historical_0051"
# This is the path to the complete simulation output.
# This has a lot of data. We don't want to copy over everything.
# So, this script will copy over only the necessary files.
complete_simulation_output="/lcrc/group/e3sm2/ac.wlin/E3SMv3/v3.LR.historical_0051"
restart_year="0051"
start_year=1985
end_year_short=1988
end_year_long=1994
end_year_closed_interval=1995
fi
case_prefix="/lcrc/group/e3sm/ac.forsyth2/zppy_test_data/E3SM${version}/${case_name}"
rm -rf ${case_prefix} # Start fresh
echo "Creating reduced data set: ${case_prefix}"
mkdir -p ${case_prefix}/archive/atm/hist
mkdir -p ${case_prefix}/archive/ice/hist
mkdir -p ${case_prefix}/archive/lnd/hist
mkdir -p ${case_prefix}/archive/ocn/hist
mkdir -p ${case_prefix}/archive/rof/hist
mkdir -p ${case_prefix}/run
for year in $(seq ${start_year} ${end_year_closed_interval}); do
cd ${complete_simulation_output}/archive/ice/hist
# For mpas_analysis
cp ${case_name}.mpassi.hist.am.timeSeriesStatsMonthly.${year}*.nc ${case_prefix}/archive/ice/hist/
cd ${complete_simulation_output}/archive/ocn/hist
# For mpas_analysis, global_time_series
cp ${case_name}.mpaso.hist.am.timeSeriesStatsMonthly.${year}*.nc ${case_prefix}/archive/ocn/hist/
# For mpas_analysis only
cp ${case_name}.mpaso.hist.am.timeSeriesStatsMonthlyMin.${year}*.nc ${case_prefix}/archive/ocn/hist/
cp ${case_name}.mpaso.hist.am.timeSeriesStatsMonthlyMax.${year}*.nc ${case_prefix}/archive/ocn/hist/
cp ${case_name}.mpaso.hist.am.meridionalHeatTransport.${year}*.nc ${case_prefix}/archive/ocn/hist/
cp ${case_name}.mpaso.hist.am.oceanHeatContent.${year}*.nc ${case_prefix}/archive/ocn/hist/
done
for year in $(seq ${start_year} ${end_year_long}); do
cd ${complete_simulation_output}/archive/atm/hist
# For climo_atm_monthly, ts_atm_monthly (end_year_short)
# For ts_atm_monthly_glb (end_year_long)
cp ${case_name}.eam.h0.${year}-*.nc ${case_prefix}/archive/atm/hist/
cd ${complete_simulation_output}/archive/lnd/hist
# For climo_land_monthly, ts_land_monthly (end_year_short)
# For ts_lnd_monthly_glb (end_year_long)
cp ${case_name}.elm.h0.${year}-*.nc ${case_prefix}/archive/lnd/hist/
done
for year in $(seq ${start_year} ${end_year_short}); do
cd ${complete_simulation_output}/archive/atm/hist
# For ts_atm_daily
cp ${case_name}.eam.h1.${year}-*.nc ${case_prefix}/archive/atm/hist/
# For climo_atm_monthly_diurnal
cp ${case_name}.eam.h3.${year}-*.nc ${case_prefix}/archive/atm/hist/
cd ${complete_simulation_output}/archive/rof/hist
# For ts_rof_monthly
cp ${case_name}.mosart.h0.${year}-*.nc ${case_prefix}/archive/rof/hist/
done
cd ${complete_simulation_output}/run
cp ${case_name}.mpaso.rst.${restart_year}-01-01_00000.nc ${case_prefix}/run/
cp ${case_name}.mpassi.rst.${restart_year}-01-01_00000.nc ${case_prefix}/run/
cp mpaso_in ${case_prefix}/run/
cp mpassi_in ${case_prefix}/run/
cp streams.ocean ${case_prefix}/run/
cp streams.seaice ${case_prefix}/run/
echo "Complete simulation output: ${complete_simulation_output}"
echo "Reduced data set: ${case_prefix}"
echo "Size:"
du -sh ${case_prefix}
# 859G /lcrc/group/e3sm/ac.forsyth2/zppy_test_data/E3SMv3/v3.LR.historical_0051 |
I'm running into a data transfer issue. I used Globus to transfer data:
But I get differing data set sizes: # Chrysalis:
du -sh /lcrc/group/e3sm/ac.forsyth2/zppy_test_data/E3SMv3
# 860G /lcrc/group/e3sm/ac.forsyth2/zppy_test_data/E3SMv3
# Compy:
du -sh /compyfs/fors729/zppy_test_data/E3SMv3/
# 572G /compyfs/fors729/zppy_test_data/E3SMv3/ Globus says the transfer succeeded though. But then why is the Compy version ~300G smaller? |
That seems worth double checking At least make sure all the files are there. Are the sizes of individual files at least the same if you pick a few at random? |
In
In
This is pretty strange. Some files get bigger and some get smaller! |
In
In
Well they appear to have the same number of files -- the problem seems to be the files change size haphazardly. |
Files in |
Can you vim the text files and see if the bottom is the same? Can you ncdump the NetCDF files and at least verify that they are dumpable? It may just be that Compy's file system compresses things or something. That might explain why it's the slowest file system I have ever had the misfortune to work with. |
Oh interesting, I was wondering that myself.
A preliminary look does seem to suggest this is true. So maybe Compy is just compressing things.
I agree; commands take much longer to run on Compy for me. |
Request criteria
Issue description
For initial explanation, see #634 (reply in thread).
Currently (2025-02-03),
zppy
's v2 testing data is 18T and its v3 testing data is 24T. That's a total of 42T. This poses a problem with quotas on Compy./qfs
home directories are limited to 400 GB, and/compyfs
home directories are limited to 30T. Obviously, 30T < 42T. What this means currently is I can't actually transfer v3 data to Compy for testing Unified 1.11.0.The text was updated successfully, but these errors were encountered: