Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial ICI changes #238

Draft
wants to merge 114 commits into
base: dev
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
882ce86
Creating default config, python environment, and GCC environment sett…
Mar 19, 2024
466fefc
Changing DataPathTROPOMI to DataPathObs (and adding Species to config…
Mar 20, 2024
4afa68e
Changes to generalize supporting other clusters, including adding Spe…
Mar 20, 2024
8b7e200
Default species for Harvard should be CH4
Mar 20, 2024
beb0651
Generalizing scheduler options
Mar 20, 2024
277bcf3
Moving conda environment activation into environment files
Mar 21, 2024
a217e2e
Removing conda environment specification from config file
Mar 21, 2024
e8312be
Removing conda environment specification from config file
Mar 21, 2024
5d3abb3
Moving conda environment activation into environment files
Mar 21, 2024
08023af
Removing UseSlurm option and replacing with UseScheduler/SchedulerType
Mar 21, 2024
a8576c4
Switching from UseSlurm to UseScheduler/SchedulerType
Mar 21, 2024
e8b13e7
Adding PythonEnv option
Mar 22, 2024
639a1f4
Adding PythonEnv to variables required for local simulation
Mar 22, 2024
55bd28c
Removing python loading from GEOS-Chem environment
Mar 22, 2024
5c05ed4
Creating separate environment file for python loading
Mar 22, 2024
0960227
Adding separate statement for loading Python env
Mar 22, 2024
fb1e776
Switching conda activate to source
Mar 22, 2024
fdf9e33
Handling python environment loading for AWS (now hardcoded?) and loca…
Mar 22, 2024
cdad798
Added reference to PythonEnv since htat's now specified along with GE…
Mar 22, 2024
035b444
Changes related to generalizing the IMI for multiple clusters
hannahnesser Apr 2, 2024
6e6980e
Amending config files to include PythonEnv and simplify SchedulerType…
hannahnesser Apr 2, 2024
59227ab
These files contained non-default uses of sbatch. As a result, we rep…
hannahnesser Apr 2, 2024
9eea594
Removed reference to UseScheduler
hannahnesser Apr 2, 2024
b0ef56b
Documentation changes consistent with the changes described in a prev…
hannahnesser Apr 2, 2024
d9d7c62
Adding tmux option to SchedulerType
hannahnesser Apr 2, 2024
e6bfae8
Adding species argument
hannahnesser Apr 2, 2024
656b013
Removing only aws requirementS
hannahnesser Apr 2, 2024
a2b24b4
Modified to handle tmux
hannahnesser Apr 3, 2024
6436535
Switched to standard SBATCH headers
Apr 11, 2024
01c522a
Adding in corrections for PBS
Apr 12, 2024
4279240
Changes to SBATCH to PBS conversion
Apr 12, 2024
17d4dda
Moved SBATCH to PBS conversion to common.sh
Apr 12, 2024
262ad9e
Added units to mem to avoid confusion in conversion from SBATCH to PBS
Apr 12, 2024
211932d
Added units to mem to avoid confusion in conversion from SBATCH to PBS
Apr 12, 2024
b29b878
Switched condaenv options to PythonEnv
Apr 12, 2024
5318cb8
Minor syntax fix
Jun 7, 2024
d0d6a41
Mostly changes to get replacement of SBATCH to PBS working
Jun 7, 2024
bc91e5e
Adding print statement to check
Jun 7, 2024
f133892
Continued bug fixes to awk
Jun 7, 2024
288d049
Continued debugging
Jun 8, 2024
6c829e6
Continued bug fixes
Jun 24, 2024
5d50a2a
Adding node request to sbatch
Jun 24, 2024
adbe07e
Removing nodes options, which will now be passed with sbatch -N directly
Jun 24, 2024
b9839a2
Changing default format of time request to HH:MM:SS for consistency b…
Jun 25, 2024
338d10a
Removed BlendedTROPOMI variable and replaced it with satellite_str (s…
Jun 26, 2024
dded98c
Removing references to TROPOMI and correcting function names to match…
Jun 26, 2024
ecdbe13
Removing references to TROPOMI and correcting function names to match…
Jun 26, 2024
a6ed832
Switched HourlyCH4 to HourlySpecies, changed ch4_run.template to run.…
Jun 27, 2024
ad5303e
Changed HourlyCH4 to HourlySpecies and ch4_run.template to run.template
Jun 27, 2024
4b55f6b
Changed tropomi_cache --> satellite_cache
Jun 27, 2024
694dfd5
Moving away from specific TROPOMI/CH4 references (mostly in variable …
Jun 27, 2024
9156a77
Changes mostly to variable names to move away from TROPOMI-specific a…
Jun 27, 2024
2f83ab5
Removing TROPOMI specific references
Jun 27, 2024
ac3385a
- Added species argument to average_satellite_observations
Jun 27, 2024
dfce7e6
Adding functions: mixing_ratio_conv_factor, species_molar_mass, read_…
Jun 27, 2024
9a33d72
Change BlendedTROPOMI to SatelliteProduct, change HourlyCH4 to Hourly…
Jun 27, 2024
1ab74f2
While this script is implicitly very TROPOMI-specific, we still remai…
Jun 27, 2024
2873f6c
- Change ch4_run.template to use the species variable from the config…
Jun 27, 2024
49fc3e1
Changed data directory name from data_TROPOMI to data_satellite; ther…
Jun 27, 2024
771b962
Changed ch4_run.template to <species>_run.template
Jun 27, 2024
e128bda
Changed ch4_run.template to <species>_run.template
Jun 27, 2024
46b6dd3
Move UseBCsForRestart from setup.sh to config file
Jun 27, 2024
0156426
Switched from HourlyCH4 to HourlySpecies and from ch4_run.template to…
Jun 27, 2024
af530d4
Changed name of TROPOMI data directory from data_TROPOMI to data_sate…
Jun 27, 2024
dafd54c
Changed HourlyCH4 to HourlySpecies and ch4_run.template to <species>_…
Jun 27, 2024
79127be
Added species argument to main options
Jun 28, 2024
74d79df
Changed tropomiCache to satelliteCache and addded Species and Satelli…
Jun 28, 2024
262a478
Removed bug check print statements and fixed a few PBS specific bugs
Jun 28, 2024
275295b
Corrected calls to write_boundary_conditions to include new arguments
Jun 28, 2024
0013b7f
Changed tropomiCache to satelliteCache, data_TROPOMI default storage …
Jun 28, 2024
3bc4444
Changed HourlyCH4 to HourlySpecies and BlendedTROPOMI to SatellitePro…
Jun 28, 2024
2150008
Changed HourlyCH4 to HourlySpecies
Jun 28, 2024
02090d7
Changed <species>_run.template to run.template
Jun 28, 2024
6a6f6de
Generalizing name of run template
Jun 28, 2024
fd0c727
Changed ch4_run.template to just run.template
Jun 28, 2024
732e551
Local changes
Jun 28, 2024
9c4bc72
Changing back to default config files
Jun 28, 2024
d96bf30
Merging dev and ICI branches before moving the ICI branch to a sub br…
Jun 28, 2024
a8baba5
Moving sbatch requests to use the submit_job function
Jun 28, 2024
f8a67e6
Adding False argument for the new viz Boolean
Jun 28, 2024
ae02566
Adding species arguments
Jun 28, 2024
56eea96
Removing specific CH4 references and replacing with {species} and doi…
Jun 28, 2024
2086e7c
Fixing comment string
Jul 1, 2024
d24cb3e
Updating to include new flags from dev branch
Jul 1, 2024
5243ef9
Removing condaEnv/condaFile refs and replacing with PythonEnv per the…
Jul 1, 2024
2864204
Adjusting for SBATCH options that were previously not caught by conve…
Jul 1, 2024
9665bb4
Adding a check to see if the PBS -l site=needed option has been previ…
Jul 1, 2024
8a8b167
Printing out any SBATCH options not caught by the conversion script
Jul 1, 2024
15c139a
Removed print statement that didn't really work
Jul 1, 2024
02e3747
Removing activation of python, which we get instead from a designated…
Jul 2, 2024
2190404
Switched tabs to spaces for readability
Jul 2, 2024
2e1d6bc
Changed tabs to spaces and changed hard coded GEOSCHEM_VERSION to the…
Jul 2, 2024
32efc58
Changes to allow submit_job to take a SaveOutput true/false boolean
Jul 2, 2024
ac774b0
Added SaveOut option to submit_job
Jul 2, 2024
208ffe8
Added SaveOut option to submit_job
Jul 2, 2024
62cf5c4
Looks like I accidentally deletd a fi?
Jul 2, 2024
7d5698f
Added a missing fi
Jul 2, 2024
192eadf
Updated config for Pleiades
Jul 2, 2024
b936c5b
Resolving merge conflicts with bugfix/jacobian-perturbation-fix
Jul 3, 2024
94d65ee
Switching from Simulation CPUs/Memory to Requested CPUs/Memory
Jul 3, 2024
35c4a09
Removing commented out vestige
Jul 3, 2024
958b1ea
Removing lingering references to TROPOMI
Jul 3, 2024
69ec97a
Updating options to better match bugfix/jacobian-perturbation-fix con…
Jul 3, 2024
acf033f
UseEmisSF and UseOHSF seem to have been removed as options
Jul 3, 2024
a156273
- Bug fix for BCs being used in all simulations, not just regional si…
Jul 3, 2024
4d30edb
Beginning to generalize HEMCO tracer changes for not just methane
Jul 3, 2024
da7fe89
Generalized function to be a function of species
Jul 3, 2024
e951e42
Changed from tabs to spaces for readability and removed portions of c…
Jul 3, 2024
c9ca0c3
Updating
Jul 3, 2024
4c87fbc
Merge resolution with dev branch
Jul 3, 2024
03fd4ea
Merge remote-tracking branch 'origin/dev'
laestrada Oct 22, 2024
6767732
Merge remote-tracking branch 'origin/dev'
laestrada Oct 23, 2024
e64b99b
Merge remote-tracking branch 'origin/dev'
laestrada Oct 31, 2024
c754c5a
Resolving merge conflicts
Nov 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
While this script is implicitly very TROPOMI-specific, we still remai…
…ned TROPOMI and

CH4 references. It may be that someone would want to add them back in--I'm just
going through everything initially.
- Renamed get_TROPOMI_times to get_satellite_times
- Renamed apply_TROPOMI_operator_to_one_TROPOMI_file to
  apply_satellite_operator_to_one_satellite_file
- Add satellite_product and species arguments to
  apply_satellite_operator_to_one_satellite_file
- Remove BlendedTROPOMI variable and replaced with species and satellite_product
  throughout all the functions
- Removed references to CH4 and TROPOMI in variable names
- Called mixing_ratio_conv_factor instead of using hard coded 1e-9
Hannah Nesser committed Jun 27, 2024
commit 1ab74f29602b0af80931ba08785ef8ad5d7d91cd
107 changes: 55 additions & 52 deletions src/write_BCs/write_boundary_conditions.py
Original file line number Diff line number Diff line change
@@ -15,35 +15,35 @@

sys.path.insert(0, "../../")
from src.inversion_scripts.operators.operator_utilities import nearest_loc
from src.inversion_scripts.operators.TROPOMI_operator import apply_tropomi_operator
from src.inversion_scripts.utils import save_obj, load_obj
from src.inversion_scripts.operators.satellite_operator import apply_satellite_operator
from src.inversion_scripts.utils import mixing_ratio_conv_factor

def get_TROPOMI_times(filename):

def get_satellite_times(filename):
"""
Function that parses the TROPOMI filenames to get the start and end times.
Function that parses the satellite filenames to get the start and end times.
Example input (str): S5P_RPRO_L2__CH4____20220725T152751_20220725T170921_24775_03_020400_20230201T100624.nc
Example output (tuple): (np.datetime64('2022-07-25T15:27:51'), np.datetime64('2022-07-25T17:09:21'))
"""

file_times = re.search(r'(\d{8}T\d{6})_(\d{8}T\d{6})', filename)
assert file_times is not None, "check TROPOMI filename - wasn't able to find start and end times in the filename"
start_TROPOMI_time = np.datetime64(datetime.datetime.strptime(file_times.group(1), "%Y%m%dT%H%M%S"))
end_TROPOMI_time = np.datetime64(datetime.datetime.strptime(file_times.group(2), "%Y%m%dT%H%M%S"))
assert file_times is not None, "check satellite filename - wasn't able to find start and end times in the filename"
start_satellite_time = np.datetime64(datetime.datetime.strptime(file_times.group(1), "%Y%m%dT%H%M%S"))
end_satellite_time = np.datetime64(datetime.datetime.strptime(file_times.group(2), "%Y%m%dT%H%M%S"))

return start_TROPOMI_time, end_TROPOMI_time
return start_satellite_time, end_satellite_time

def apply_tropomi_operator_to_one_tropomi_file(filename):
def apply_satellite_operator_to_one_satellite_file(filename, satellite_product, species):

"""
Run apply_tropomi_operator from src/inversion_scripts/operators/TROPOMI_operator.py for a single TROPOMI file (then saves it to a pkl file)
Run apply_satellite_operator from src/inversion_scripts/operators/satellite_operator.py for a single satellite file (then saves it to a pkl file)
Example input (str): S5P_RPRO_L2__CH4____20220725T152751_20220725T170921_24775_03_020400_20230201T100624.nc
Example output: write the file config["workdir"]/step1/S5P_RPRO_L2__CH4____20220725T152751_20220725T170921_24775_03_020400_20230201T100624_GCtoTROPOMI.pkl
Example output: write the file config["workdir"]/step1/S5P_RPRO_L2__CH4____20220725T152751_20220725T170921_24775_03_020400_20230201T100624_GCtoSatellite.pkl
"""

result = apply_tropomi_operator(
result = apply_satellite_operator(
filename = filename,
BlendedTROPOMI = blendedTROPOMI,
species = species,
satellite_product = satellite_product,
n_elements = False, # Not relevant
gc_startdate = start_time_of_interest,
gc_enddate = end_time_of_interest,
@@ -55,17 +55,17 @@ def apply_tropomi_operator_to_one_tropomi_file(filename):

return result["obs_GC"],filename

def create_daily_means(satelliteDir, start_time_of_interest, end_time_of_interest):
def create_daily_means(satelliteDir, satellite_product, species, start_time_of_interest, end_time_of_interest):

# List of all TROPOMI files that interesct our time period of interest
TROPOMI_files = sorted([file for file in glob.glob(os.path.join(satelliteDir, "*.nc"))
if (start_time_of_interest <= get_TROPOMI_times(file)[0] <= end_time_of_interest)
or (start_time_of_interest <= get_TROPOMI_times(file)[1] <= end_time_of_interest)])
print(f"First TROPOMI file -> {TROPOMI_files[0]}")
print(f"Last TROPOMI file -> {TROPOMI_files[-1]}")
# List of all satellite files that interesct our time period of interest
satellite_files = sorted([file for file in glob.glob(os.path.join(satelliteDir, "*.nc"))
if (start_time_of_interest <= get_satellite_times(file)[0] <= end_time_of_interest)
or (start_time_of_interest <= get_satellite_times(file)[1] <= end_time_of_interest)])
print(f"First satellite file -> {satellite_files[0]}")
print(f"Last satellite file -> {satellite_files[-1]}")

# Using as many cores as you have, apply the TROPOMI operator to each file
obsGC_and_filenames = Parallel(n_jobs=-1)(delayed(apply_tropomi_operator_to_one_tropomi_file)(filename) for filename in TROPOMI_files)
# Using as many cores as you have, apply the satellite operator to each file
obsGC_and_filenames = Parallel(n_jobs=-1)(delayed(apply_satellite_operator_to_one_satellite_file)(filename, satellite_product, species) for filename in satellite_files)

# Read any of the GEOS-Chem files to get the lat/lon grid
with xr.open_dataset(glob.glob(os.path.join(config["workDir"], "gc_run", "OutputDir", "GEOSChem.SpeciesConc*.nc4"))[0]) as data:
@@ -77,50 +77,51 @@ def create_daily_means(satelliteDir, start_time_of_interest, end_time_of_interes
alldates = [day.astype(datetime.datetime).strftime("%Y%m%d") for day in alldates]

# Initialize arrays for regridding
daily_TROPOMI = np.zeros((len(LON), len(LAT), len(alldates)))
daily_satellite = np.zeros((len(LON), len(LAT), len(alldates)))
daily_GC = np.zeros((len(LON), len(LAT), len(alldates)))
daily_count = np.zeros((len(LON), len(LAT), len(alldates)))

# Loop thorugh all of the files which now contain TROPOMI and the corresponding GC XCH4
# Loop thorugh all of the files which now contain satellite data and the
# corresponding GC mixing ratios
for obsGC,filename in obsGC_and_filenames:
NN = obsGC.shape[0]
if NN == 0:
continue

# For each TROPOMI observation, assign it to a GEOS-Chem grid cell
# For each satellite observation, assign it to a GEOS-Chem grid cell
for iNN in range(NN):

# Which day are we on (this is not perfect right now because orbits can cross from one day to the next...
# but it is the best we can do right now without changing apply_tropomi_operator)
# but it is the best we can do right now without changing apply_satellite_operator)
file_times = re.search(r'(\d{8}T\d{6})_(\d{8}T\d{6})', filename)
assert file_times is not None, "check TROPOMI filename - wasn't able to find start and end times in the filename"
assert file_times is not None, "check satellite filename - wasn't able to find start and end times in the filename"
date = datetime.datetime.strptime(file_times.group(1), "%Y%m%dT%H%M%S").strftime("%Y%m%d")
time_ind = alldates.index(date)

c_TROPOMI, c_GC, lon0, lat0 = obsGC[iNN, :4]
c_satellite, c_GC, lon0, lat0 = obsGC[iNN, :4]
ii = nearest_loc(lon0, LON, tolerance=5)
jj = nearest_loc(lat0, LAT, tolerance=4)
daily_TROPOMI[ii, jj, time_ind] += c_TROPOMI
daily_satellite[ii, jj, time_ind] += c_satellite
daily_GC[ii, jj, time_ind] += c_GC
daily_count[ii, jj, time_ind] += 1

# Normalize by how many observations got assigned to a grid cell to finish the regridding
daily_count[daily_count == 0] = np.nan
daily_TROPOMI = daily_TROPOMI / daily_count
daily_satellite = daily_satellite / daily_count
daily_GC = daily_GC / daily_count

# Change dimensions
regrid_TROPOMI = np.einsum("ijl->lji", daily_TROPOMI) # (lon, lat, time) -> (time, lat, lon)
regrid_satellite = np.einsum("ijl->lji", daily_satellite) # (lon, lat, time) -> (time, lat, lon)
regrid_GC = np.einsum("ijl->lji", daily_GC) # (lon, lat, time) -> (time, lat, lon)

# Make a Dataset with variables of (TROPOMI_CH4, GC_CH4) and dims of (lon, lat, time)
# Make a Dataset with variables of (satellite, GC) and dims of (lon, lat, time)
daily_means = xr.Dataset({
'TROPOMI_CH4': xr.DataArray(
data = regrid_TROPOMI,
'satellite': xr.DataArray(
data = regrid_satellite,
dims = ["time", "lat", "lon"],
coords = {"time": alldates, "lat": LAT, "lon": LON}
),
'GC_CH4': xr.DataArray(
'GC': xr.DataArray(
data = regrid_GC,
dims = ["time", "lat", "lon"],
coords = {"time": alldates, "lat": LAT, "lon": LON}
@@ -131,7 +132,7 @@ def create_daily_means(satelliteDir, start_time_of_interest, end_time_of_interes

def calculate_bias(daily_means):

bias = daily_means["GC_CH4"] - daily_means["TROPOMI_CH4"]
bias = daily_means["GC"] - daily_means["satellite"]

# Smooth spatially
bias = bias.rolling(lat=5, # five lat grid boxes (10 degrees)
@@ -163,21 +164,21 @@ def calculate_bias(daily_means):
# Use these values to fill NaNs
bias = bias.fillna(nan_value_filler_3d)

print(f"Average bias (GC-TROPOMI): {bias.mean().values:.2f} ppb\n")
print(f"Average bias (GC-satellite): {bias.mean().values:.2f} ppb\n")

# If there are still NaNs (this will happen when TROPOMI data is missing), use 0.0 ppb as the bias but warn the user
# If there are still NaNs (this will happen when satellite data is missing), use 0.0 ppb as the bias but warn the user
for t in range(len(bias["time"].values)):
if np.any(np.isnan(bias[t,:,:].values)):
print(f"WARNING -> using 0.0 ppb as bias for {bias['time'].values[t]}")
bias[t,:,:] = bias[t,:,:].fillna(0)

return bias

def write_bias_corrected_files(bias):
def write_bias_corrected_files(bias, species, satellite_product):

# Get dates and convert the total column bias to mol/mol
strdate = bias["time"].values
bias_mol_mol = bias.values * 1e-9
bias_mol_mol = bias.values / mixing_ratio_conv_factor(species)

# Only write BCs for our date range
files = sorted(glob.glob(os.path.join(config["workDir"], "gc_run", "OutputDir", "GEOSChem.BoundaryConditions*.nc4")))
@@ -198,31 +199,33 @@ def write_bias_corrected_files(bias):
bias_for_this_boundary_condition_file = bias_mol_mol[index, :, :]

with xr.open_dataset(filename) as ds:
original_data = ds["SpeciesBC_CH4"].values.copy()
original_data = ds[f"SpeciesBC_{species}"].values.copy()
for t in range(original_data.shape[0]):
for lev in range(original_data.shape[1]):
original_data[t, lev, :, :] -= bias_for_this_boundary_condition_file
ds["SpeciesBC_CH4"].values = original_data
if blendedTROPOMI:
ds[f"SpeciesBC_{species}"].values = original_data
if satellite_product == "BlendedTROPOMI":
print(f"Writing to {os.path.join(config['workDir'], 'blended-boundary-conditions', os.path.basename(filename))}")
ds.to_netcdf(os.path.join(config["workDir"], "blended-boundary-conditions", os.path.basename(filename)))
else:
elif satellite_product == "TROPOMI":
print(f"Writing to {os.path.join(config['workDir'], 'tropomi-boundary-conditions', os.path.basename(filename))}")
ds.to_netcdf(os.path.join(config["workDir"], "tropomi-boundary-conditions", os.path.basename(filename)))

else:
print("Other data sources for boundary conditions are not currently supported --HON")

if __name__ == "__main__":

# Arguments from run_boundary_conditions.sh
blendedTROPOMI = (sys.argv[1] == "True") # use blended data?
satellite_product = sys.argv[1] # use blended data?
satelliteDir = sys.argv[2] # where is the satellite data?
species = sys.argv[3]
# Start of GC output (+1 day except 1 Apr 2018 because we ran 1 day extra at the start to account for data not being written at t=0)
start_time_of_interest = np.datetime64(datetime.datetime.strptime(sys.argv[3], "%Y%m%d"))
start_time_of_interest = np.datetime64(datetime.datetime.strptime(sys.argv[4], "%Y%m%d"))
if start_time_of_interest != np.datetime64("2018-04-01T00:00:00"):
start_time_of_interest += np.timedelta64(1, "D")
# End of GC output
end_time_of_interest = np.datetime64(datetime.datetime.strptime(sys.argv[4], "%Y%m%d"))
print(f"\nwrite_boundary_conditions.py output for blendedTROPOMI={blendedTROPOMI}")
end_time_of_interest = np.datetime64(datetime.datetime.strptime(sys.argv[5], "%Y%m%d"))
print(f"\nwrite_boundary_conditions.py output for {satellite_product}")
print(f"Using files at {satelliteDir}")

"""
@@ -240,6 +243,6 @@ def write_bias_corrected_files(bias):
- using the bias from Part 2, subtract the (GC-TROPOMI) bias from the GC boundary conditions
"""

daily_means = create_daily_means(satelliteDir, start_time_of_interest, end_time_of_interest)
daily_means = create_daily_means(satelliteDir, satellite_product, species, start_time_of_interest, end_time_of_interest)
bias = calculate_bias(daily_means)
write_bias_corrected_files(bias)
write_bias_corrected_files(bias, species, satellite_product)