NOAA-GFDL · yichengt900 · Jan 2, 2025 · Dec 16, 2024 · Dec 17, 2024 · Dec 17, 2024
diff --git a/tools/boundary/glorys_obc_workflow/README.md b/tools/boundary/glorys_obc_workflow/README.md
@@ -0,0 +1,145 @@
+# MOM6 Open Boundary Conditions (OBC) Generation Workflow
+
+This repository provides an example workflow for generating Open Boundary Conditions (OBC) for MOM6 using daily GLORYS data on PPAN.
+
+## Overview
+
+The main script, `mom6_obc_workflow.sh`, orchestrates the entire OBC generation process. The workflow includes the following steps:
+
+1. **Spatial Subsetting of GLORYS Data**  
+   - Iterates through each day within a specified date range to spatially subset the original GLORYS dataset on UDA.  
+   - Reduces computational cost by limiting input data to the regional domain of interest instead of the entire global GLORYS domain.
+
+2. **Filling Missing Values in Subset GLORYS Files**  
+   - Processes each daily subset file using CDO to fill missing values.  
+   - Compresses processed files with `ncks -4 -L 5`.  
+   - Combines all variables (e.g., `thetao`, `so`, `zos`, `uo`, `vo`) into a single NetCDF file.
+
+3. **Daily Boundary Condition Generation**  
+   - Submits jobs to execute the `write_glorys_boundary_daily.py` script for each day.  
+   - Regrids GLORYS data and generates daily OBC files.
+
+Template scripts for these steps are provided in the `template` directory. User-specific parameters are configured using [uwtools](https://github.com/ufs-community/uwtools) to render templates, creating a `config.yaml` file based on user input.
+
+---
+
+## Configuration Example
+
+Below is an example `config.yaml` file to set up parameters for the workflow:
+
+```yaml
+# General parameters for template scripts
+_WALLTIME: "1440" # Wall time (in minutes) for SLURM jobs
+_NPROC: "1" # Number of processes for each job
+_EMAIL_NOTIFICATION: "fail" # SLURM email notification option
+_USER_EMAIL: "[email protected]" # Email address for error notifications
+_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j" # Path for job logs
+_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily" # Path to original GLORYS data
+_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean" # File name prefix for GLORYS data
+_REGIONAL_GLORYS_ARCHIVE: "/archive/user/datasets/glorys" # Archive path for processed daily files
+_BASIN_NAME: "NWA12" # Regional domain name
+_OUTPUT_PREFIX: "GLORYS" # Prefix for output files
+_VARS: "thetao so uo vo zos" # Variables to process
+_LON_MIN: "-100.0" # Minimum longitude for subsetting
+_LON_MAX: "-30.0" # Maximum longitude for subsetting
+_LAT_MIN: "5.0" # Minimum latitude for subsetting
+_LAT_MAX: "60.0" # Maximum latitude for subsetting
+_PYTHON_SCRIPT: "$PYTHON_SCRIPT" # Path to the Python script for daily OBC generation
+
+# Date range for processing
+first_date: "$START_DATE"
+last_date: "$END_DATE"
+
+# Python script parameters
+glorys_dir: "/archive/user/datasets/glorys/NWA12/filled" # daily subset of GLORYS DATA after filling NaN
+output_dir: "./outputs" # output path for the obc files
+hgrid: "./ocean_hgrid.nc" # grid file
+ncrcat_names:
+  - "thetao"
+  - "so"
+  - "zos"
+  - "uv"
+segments:
+  - id: 1
+    border: "south"
+  - id: 2
+    border: "north"
+  - id: 3
+    border: "east"
+variables:
+  - "thetao"
+  - "so"
+  - "zos"
+  - "uv"
+```
+
+# Workflow Usage
+
+## Step 1: Modify Configuration
+Update the `cat <<EOF > config.yaml` part in `mom6_obc_workflow.sh` with parameters specific to your domain and workflow requirements.
+
+```
+cat <<EOF > config.yaml
+_WALLTIME: "1440"
+_NPROC: "1"
+_EMAIL_NOTIFACTION: "fail"
+_USER_EMAIL: "[email protected]"
+_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j"
+_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily"
+_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean"
+_REGIONAL_GLORYS_ARCHIVE: "/archive/ynt/datasets/glorys"
+_BASIN_NAME: "NWA12"
+_OUTPUT_PREFIX: "GLORYS"
+_VARS: "thetao so uo vo zos"
+_LON_MIN: "-100.0"
+_LON_MAX: "-30.0"
+_LAT_MIN: "5.0"
+_LAT_MAX: "60.0"
+_PYTHON_SCRIPT: "$PYTHON_SCRIPT"
+first_date: "$START_DATE"
+last_date: "$END_DATE"
+glorys_dir: "/archive/ynt/datasets/glorys/NWA12/filled"
+output_dir: "./outputs"
+hgrid: './ocean_hgrid.nc'
+ncrcat_names:
+  - 'thetao'
+  - 'so'
+  - 'zos'
+  - 'uv'
+segments:
+  - id: 1
+    border: 'south'
+  - id: 2
+    border: 'north'
+  - id: 3
+    border: 'east'
+variables:
+  - 'thetao'
+  - 'so'
+  - 'zos'
+  - 'uv'
+EOF
+```
+
+## Step 2: Generate OBC Files
+Run the workflow for a specific year or date range:
+
+```bash
+./mom6_obc_workflow.sh 2022-01-01 2022-12-31
+./mom6_obc_workflow.sh 2023-01-01 2023-12-31
+```
+
+## Step 3: Concatenate Multiple Years of OBC Files
+To merge OBC files from multiple years into a single file, use the `--ncrcat` option. Ensure the dates in the command match the range for which you generated OBC files:
+
+```bash
+./mom6_obc_workflow.sh 2022-01-01 2023-12-31 --ncrcat
+```
+### Adjust Timestamps (Optional Substep)
+If you need to adjust the timestamps of the first and last records for compatibility with `MOM6` yearly simulation, use the `--adjust-timestamps` option in combination with `--ncrcat`. Note that this is an alternative to the command above and should not be run afterward: 
+```
+./mom6_obc_workflow.sh 2022-01-01 2023-12-31 --ncrcat --adjust-timestamps
+``` 
+**Note**: Ensure the date range specified in your command corresponds to the dates for which you generated OBC files. Running this step with a mismatched date range will cause it to fail if files for the specified dates are missing.
+
+TODO:
diff --git a/tools/boundary/glorys_obc_workflow/mom6_obc_workflow.sh b/tools/boundary/glorys_obc_workflow/mom6_obc_workflow.sh
@@ -0,0 +1,179 @@
+#!/bin/bash
+
+# Load required modules and environments
+source $MODULESHOME/init/sh
+module load miniforge
+conda activate /nbhome/role.medgrp/.conda/envs/uwtools || { echo "Error activating conda environment. Exiting."; exit 1; }
+
+set -eu
+
+# Helper functions
+print_usage() {
+    echo "Usage: $0 START_DATE END_DATE [--ncrcat] [--adjust-timestamps]"
+    echo "  START_DATE and END_DATE must be in YYYY-MM-DD format."
+    echo "  --ncrcat: Enable ncrcat step (skips subset, fill, and submit_python steps)."
+    echo "  --adjust-timestamps: Adjust timestamps during ncrcat step."
+}
+
+validate_date_format() {
+    if [[ ! "$1" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ ]]; then
+        echo "Error: Date $1 must be in YYYY-MM-DD format."
+        exit 1
+    fi
+}
+
+log_message() {
+    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
+}
+
+# Default options
+DO_NCRCAT=false
+ADJUST_TIMESTAMPS=false
+PYTHON_SCRIPT="../write_glorys_boundary_daily.py"
+
+# Parse arguments
+START_DATE="$1"
+END_DATE="$2"
+shift 2
+
+while [[ $# -gt 0 ]]; do
+    case "$1" in
+        --ncrcat)
+            DO_NCRCAT=true
+            ;;
+        --adjust-timestamps)
+            ADJUST_TIMESTAMPS=true
+            ;;
+        *)
+            echo "Unknown argument: $1"
+            print_usage
+            exit 1
+            ;;
+    esac
+    shift
+done
+
+validate_date_format "$START_DATE"
+validate_date_format "$END_DATE"
+
+
+start_date_epoch=$(date -d "$START_DATE" +%s)
+end_date_epoch=$(date -d "$END_DATE" +%s)
+if [[ $start_date_epoch -gt $end_date_epoch ]]; then
+    log_message "Error: START_DATE ($START_DATE) must not be after END_DATE ($END_DATE). Exiting."
+    exit 1
+fi
+
+# Ensure --adjust-timestamps is only used with --ncrcat
+if $ADJUST_TIMESTAMPS && ! $DO_NCRCAT; then
+    echo "Error: --adjust-timestamps can only be used with --ncrcat."
+    exit 1
+fi
+
+# Warn user when --ncrcat is enabled
+if $DO_NCRCAT; then
+    log_message "WARNING: --ncrcat is enabled. The script will SKIP subset, fill, and submit_python steps."
+    log_message "Ensure that all daily outputs already exist for the specified date range."
+fi
+
+# Prepare directories
+CURRENT_DATE=$(date +%Y-%m-%d-%H-%M)
+mkdir -p ./log/$CURRENT_DATE ./outputs scripts
+
+# Define user configurations
+log_message "Generating config.yaml..."
+cat <<EOF > config.yaml
+_WALLTIME: "1440"
+_NPROC: "1"
+_EMAIL_NOTIFACTION: "fail"
+_USER_EMAIL: "[email protected]"
+_LOG_PATH: "./log/$CURRENT_DATE/%x.o%j"
+_UDA_GLORYS_DIR: "/uda/Global_Ocean_Physics_Reanalysis/global/daily"
+_UDA_GLORYS_FILENAME: "mercatorglorys12v1_gl12_mean"
+_REGIONAL_GLORYS_ARCHIVE: "/archive/$USER/datasets/glorys"
+_BASIN_NAME: "NWA12"
+_OUTPUT_PREFIX: "GLORYS"
+_VARS: "thetao so uo vo zos"
+_LON_MIN: "-100.0"
+_LON_MAX: "-30.0"
+_LAT_MIN: "5.0"
+_LAT_MAX: "60.0"
+_PYTHON_SCRIPT: "$PYTHON_SCRIPT"
+first_date: "$START_DATE"
+last_date: "$END_DATE"
+glorys_dir: "/archive/$USER/datasets/glorys/NWA12/filled"
+output_dir: "./outputs"
+hgrid: './ocean_hgrid.nc'
+ncrcat_names:
+  - 'thetao'
+  - 'so'
+  - 'zos'
+  - 'uv'
+segments:
+  - id: 1
+    border: 'south'
+  - id: 2
+    border: 'north'
+  - id: 3
+    border: 'east'
+variables:
+  - 'thetao'
+  - 'so'
+  - 'zos'
+  - 'uv'
+EOF
+
+
+log_message "Preparing scripts directory..."
+[[ -d scripts ]] || mkdir scripts
+for template in subset_glorys fill_glorys submit_python_make_obc_day ncrcat_obc; do
+    rm -f scripts/${template}.sh
+    uw template render --input-file template/${template}_template.sh \
+                      --values-file config.yaml \
+                      --output-file scripts/${template}.sh || { log_message "Error rendering ${template}. Exiting."; exit 1; }
+done
+
+# Skip main steps if --ncrcat is enabled
+if ! $DO_NCRCAT; then
+
+    # Submit jobs
+    current_date_epoch=$start_date_epoch
+    job_ids=()
+
+    while [[ $current_date_epoch -le $end_date_epoch ]]; do
+        current_date=$(date -d "@$current_date_epoch" +%Y-%m-%d)
+        year=$(date -d "$current_date" +%Y)
+        month=$(date -d "$current_date" +%m)
+        day=$(date -d "$current_date" +%d)
+
+        log_message "Submitting subset job for $current_date..."
+        subset_job_id=$(sbatch --job-name="glorys_subset_${year}_${month}_${day}" \
+                              scripts/subset_glorys.sh $year $month $day | awk '{print $4}')
+
+        log_message "Submitting fill_nan job for $current_date..."
+        fill_job_id=$(sbatch --dependency=afterok:$subset_job_id \
+                              --job-name="glorys_fill_${year}_${month}_${day}" \
+                              scripts/fill_glorys.sh $year $month $day | awk '{print $4}')
+
+        log_message "Submitting Python job for $current_date..."
+        python_job_id=$(sbatch --dependency=afterok:$fill_job_id \
+                                --job-name="python_make_obc_day_${year}_${month}_${day}" \
+                                scripts/submit_python_make_obc_day.sh $year $month $day | awk '{print $4}')
+
+        job_ids+=($python_job_id)
+        current_date_epoch=$((current_date_epoch + 86400))
+    done
+fi
+
+# Optional ncrcat step
+if $DO_NCRCAT; then
+    log_message "Submitting ncrcat job..."
+    dependency_str=$(IFS=,; echo "${job_ids[*]:-}")
+    if $ADJUST_TIMESTAMPS; then
+        sbatch --job-name="obc_ncrcat" scripts/ncrcat_obc.sh --config config.yaml --ncrcat_years --adjust_timestamps
+    else
+        sbatch --job-name="obc_ncrcat" scripts/ncrcat_obc.sh --config config.yaml --ncrcat_years
+    fi
+fi
+
+log_message "Workflow completed."