Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added obc generation workflow #128

Merged
merged 5 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions tools/boundary/glorys_obc_workflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# MOM6 Open Boundary Conditions (OBC) Generation Workflow

This repository provides an example workflow for generating Open Boundary Conditions (OBC) for MOM6 using daily GLORYS data on PPAN.

## Overview

The main script, `mom6_obc_workflow.sh`, orchestrates the entire OBC generation process. The workflow includes the following steps:

1. **Spatial Subsetting of GLORYS Data**
- Iterates through each day within a specified date range to spatially subset the original GLORYS dataset on UDA.
- Reduces computational cost by limiting input data to the regional domain of interest instead of the entire global GLORYS domain.

2. **Filling Missing Values in Subset GLORYS Files**
- Processes each daily subset file using CDO to fill missing values.
- Compresses processed files with `ncks -4 -L 5`.
- Combines all variables (e.g., `thetao`, `so`, `zos`, `uo`, `vo`) into a single NetCDF file.

3. **Daily Boundary Condition Generation**
- Submits jobs to execute the `write_glorys_boundary_daily.py` script for each day.
- Regrids GLORYS data and generates daily OBC files.

Template scripts for these steps are provided in the `template` directory. User-specific parameters are configured using [uwtools](https://github.com/ufs-community/uwtools) to render templates, creating a `config.yaml` file based on user input.

---

## Configuration Example

Below is an example `config.yaml` file to set up parameters for the workflow:

```yaml
# General parameters for template scripts
_WALLTIME: "1440" # Wall time (in minutes) for SLURM jobs
_NPROC: "1" # Number of processes for each job
_EMAIL_NOTIFICATION: "fail" # SLURM email notification option
_USER_EMAIL: "[email protected]" # Email address for error notifications
_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j" # Path for job logs
_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily" # Path to original GLORYS data
_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean" # File name prefix for GLORYS data
_REGIONAL_GLORYS_ARCHIVE: "/archive/user/datasets/glorys" # Archive path for processed daily files
_BASIN_NAME: "NWA12" # Regional domain name
_OUTPUT_PREFIX: "GLORYS" # Prefix for output files
_VARS: "thetao so uo vo zos" # Variables to process
_LON_MIN: "-100.0" # Minimum longitude for subsetting
_LON_MAX: "-30.0" # Maximum longitude for subsetting
_LAT_MIN: "5.0" # Minimum latitude for subsetting
_LAT_MAX: "60.0" # Maximum latitude for subsetting
_PYTHON_SCRIPT: "$PYTHON_SCRIPT" # Path to the Python script for daily OBC generation

# Date range for processing
first_date: "$START_DATE"
last_date: "$END_DATE"

# Python script parameters
glorys_dir: "/archive/user/datasets/glorys/NWA12/filled" # daily subset of GLORYS DATA after filling NaN
output_dir: "./outputs" # output path for the obc files
hgrid: "./ocean_hgrid.nc" # grid file
ncrcat_names:
- "thetao"
- "so"
- "zos"
- "uv"
segments:
- id: 1
border: "south"
- id: 2
border: "north"
- id: 3
border: "east"
variables:
- "thetao"
- "so"
- "zos"
- "uv"
```

# Workflow Usage

## Step 1: Modify Configuration
Update the `cat <<EOF > config.yaml` part in `mom6_obc_workflow.sh` with parameters specific to your domain and workflow requirements.

```
cat <<EOF > config.yaml
_WALLTIME: "1440"
_NPROC: "1"
_EMAIL_NOTIFACTION: "fail"
_USER_EMAIL: "[email protected]"
_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j"
_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily"
_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean"
_REGIONAL_GLORYS_ARCHIVE: "/archive/ynt/datasets/glorys"
_BASIN_NAME: "NWA12"
_OUTPUT_PREFIX: "GLORYS"
_VARS: "thetao so uo vo zos"
_LON_MIN: "-100.0"
_LON_MAX: "-30.0"
_LAT_MIN: "5.0"
_LAT_MAX: "60.0"
_PYTHON_SCRIPT: "$PYTHON_SCRIPT"
first_date: "$START_DATE"
last_date: "$END_DATE"
glorys_dir: "/archive/ynt/datasets/glorys/NWA12/filled"
output_dir: "./outputs"
hgrid: './ocean_hgrid.nc'
ncrcat_names:
- 'thetao'
- 'so'
- 'zos'
- 'uv'
segments:
- id: 1
border: 'south'
- id: 2
border: 'north'
- id: 3
border: 'east'
variables:
- 'thetao'
- 'so'
- 'zos'
- 'uv'
EOF
```

## Step 2: Generate OBC Files
Run the workflow for a specific year or date range:

```bash
./mom6_obc_workflow.sh 2022-01-01 2022-12-31
./mom6_obc_workflow.sh 2023-01-01 2023-12-31
yichengt900 marked this conversation as resolved.
Show resolved Hide resolved
```

## Step 3: Concatenate Multiple Years of OBC Files
To merge OBC files from multiple years into a single file, use the `--ncrcat` option. Ensure the dates in the command match the range for which you generated OBC files:

```bash
./mom6_obc_workflow.sh 2022-01-01 2023-12-31 --ncrcat
```
### Adjust Timestamps (Optional Substep)
If you need to adjust the timestamps of the first and last records for compatibility with `MOM6` yearly simulation, use the `--adjust-timestamps` option in combination with `--ncrcat`. Note that this is an alternative to the command above and should not be run afterward:
```
./mom6_obc_workflow.sh 2022-01-01 2023-12-31 --ncrcat --adjust-timestamps
```
**Note**: Ensure the date range specified in your command corresponds to the dates for which you generated OBC files. Running this step with a mismatched date range will cause it to fail if files for the specified dates are missing.

TODO:
179 changes: 179 additions & 0 deletions tools/boundary/glorys_obc_workflow/mom6_obc_workflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
#!/bin/bash

# Load required modules and environments
source $MODULESHOME/init/sh
module load miniforge
conda activate /nbhome/role.medgrp/.conda/envs/uwtools || { echo "Error activating conda environment. Exiting."; exit 1; }

set -eu

# Helper functions
print_usage() {
echo "Usage: $0 START_DATE END_DATE [--ncrcat] [--adjust-timestamps]"
echo " START_DATE and END_DATE must be in YYYY-MM-DD format."
echo " --ncrcat: Enable ncrcat step (skips subset, fill, and submit_python steps)."
echo " --adjust-timestamps: Adjust timestamps during ncrcat step."
}

validate_date_format() {
if [[ ! "$1" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
echo "Error: Date $1 must be in YYYY-MM-DD format."
exit 1
fi
}

log_message() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}

# Default options
DO_NCRCAT=false
ADJUST_TIMESTAMPS=false
PYTHON_SCRIPT="../write_glorys_boundary_daily.py"

# Parse arguments
START_DATE="$1"
END_DATE="$2"
shift 2

while [[ $# -gt 0 ]]; do
case "$1" in
--ncrcat)
DO_NCRCAT=true
;;
--adjust-timestamps)
ADJUST_TIMESTAMPS=true
;;
*)
echo "Unknown argument: $1"
print_usage
exit 1
;;
esac
shift
done

validate_date_format "$START_DATE"
validate_date_format "$END_DATE"


start_date_epoch=$(date -d "$START_DATE" +%s)
end_date_epoch=$(date -d "$END_DATE" +%s)
if [[ $start_date_epoch -gt $end_date_epoch ]]; then
log_message "Error: START_DATE ($START_DATE) must not be after END_DATE ($END_DATE). Exiting."
exit 1
fi

# Ensure --adjust-timestamps is only used with --ncrcat
if $ADJUST_TIMESTAMPS && ! $DO_NCRCAT; then
echo "Error: --adjust-timestamps can only be used with --ncrcat."
exit 1
fi

# Warn user when --ncrcat is enabled
if $DO_NCRCAT; then
log_message "WARNING: --ncrcat is enabled. The script will SKIP subset, fill, and submit_python steps."
log_message "Ensure that all daily outputs already exist for the specified date range."
fi

# Prepare directories
CURRENT_DATE=$(date +%Y-%m-%d-%H-%M)
mkdir -p ./log/$CURRENT_DATE ./outputs scripts

# Define user configurations
yichengt900 marked this conversation as resolved.
Show resolved Hide resolved
log_message "Generating config.yaml..."
cat <<EOF > config.yaml
_WALLTIME: "1440"
_NPROC: "1"
_EMAIL_NOTIFACTION: "fail"
_USER_EMAIL: "[email protected]"
_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j"
_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily"
_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean"
_REGIONAL_GLORYS_ARCHIVE: "/archive/$USER/datasets/glorys"
_BASIN_NAME: "NWA12"
_OUTPUT_PREFIX: "GLORYS"
_VARS: "thetao so uo vo zos"
_LON_MIN: "-100.0"
_LON_MAX: "-30.0"
_LAT_MIN: "5.0"
_LAT_MAX: "60.0"
_PYTHON_SCRIPT: "$PYTHON_SCRIPT"
first_date: "$START_DATE"
last_date: "$END_DATE"
glorys_dir: "/archive/$USER/datasets/glorys/NWA12/filled"
output_dir: "./outputs"
hgrid: './ocean_hgrid.nc'
ncrcat_names:
- 'thetao'
- 'so'
- 'zos'
- 'uv'
segments:
- id: 1
border: 'south'
- id: 2
border: 'north'
- id: 3
border: 'east'
variables:
- 'thetao'
- 'so'
- 'zos'
- 'uv'
EOF


log_message "Preparing scripts directory..."
[[ -d scripts ]] || mkdir scripts
for template in subset_glorys fill_glorys submit_python_make_obc_day ncrcat_obc; do
rm -f scripts/${template}.sh
uw template render --input-file template/${template}_template.sh \
--values-file config.yaml \
--output-file scripts/${template}.sh || { log_message "Error rendering ${template}. Exiting."; exit 1; }
done

# Skip main steps if --ncrcat is enabled
if ! $DO_NCRCAT; then

# Submit jobs
current_date_epoch=$start_date_epoch
job_ids=()

while [[ $current_date_epoch -le $end_date_epoch ]]; do
current_date=$(date -d "@$current_date_epoch" +%Y-%m-%d)
year=$(date -d "$current_date" +%Y)
month=$(date -d "$current_date" +%m)
day=$(date -d "$current_date" +%d)

log_message "Submitting subset job for $current_date..."
subset_job_id=$(sbatch --job-name="glorys_subset_${year}_${month}_${day}" \
scripts/subset_glorys.sh $year $month $day | awk '{print $4}')

log_message "Submitting fill_nan job for $current_date..."
fill_job_id=$(sbatch --dependency=afterok:$subset_job_id \
--job-name="glorys_fill_${year}_${month}_${day}" \
scripts/fill_glorys.sh $year $month $day | awk '{print $4}')

log_message "Submitting Python job for $current_date..."
python_job_id=$(sbatch --dependency=afterok:$fill_job_id \
--job-name="python_make_obc_day_${year}_${month}_${day}" \
scripts/submit_python_make_obc_day.sh $year $month $day | awk '{print $4}')

job_ids+=($python_job_id)
current_date_epoch=$((current_date_epoch + 86400))
done
fi

# Optional ncrcat step
if $DO_NCRCAT; then
log_message "Submitting ncrcat job..."
dependency_str=$(IFS=,; echo "${job_ids[*]:-}")
if $ADJUST_TIMESTAMPS; then
sbatch --job-name="obc_ncrcat" scripts/ncrcat_obc.sh --config config.yaml --ncrcat_years --adjust_timestamps
else
sbatch --job-name="obc_ncrcat" scripts/ncrcat_obc.sh --config config.yaml --ncrcat_years
fi
fi

log_message "Workflow completed."
Loading
Loading