3.1. Available Workflow Configuration Parameters

Among other tasks, the setup workflow Python script (parm/setup_wflow_env.py) generates a land_analysis.yaml file that contains all of the settings for the experiment — user-selected settings from config.yaml, default values, and machine-dependent settings. The script also uses the uwtools Python package to generate a Rocoto XML file using the uwtools Python package.

The template.land_analysis.yaml file contains all parameters that can ultimately be included in the land_analysis.yaml file and workflow XML. setup_wflow_env.py first sets default values for all parameters necessary for the experiment. It sets machine-specific values based on the user’s platform. Then the user-specified values from config.yaml are loaded and override any previously-set defaults. Finally, the script generates an experiment directory containing the land_analysis.yaml file and the Rocoto XML.

Flowchart describing the Land DA setup_wflow_env.py script. First, the script detects the platform/machine the user is working on. Then, it sets default parameters, followed by machine-based parameters. It updates these parameter values based on the information provided in config.yaml and calculates additional parameters as needed. The ush/fill_jinja_template.py script is called to assemble the values from template.land_analysis.yaml and config.yaml into one complete land_analysis.yaml file, which is then converted into a land_analysis.xml file for use by Rocoto.

Fig. 3.1 Overview of the setup_wflow_env.py script

3.1.1. Workflow Attributes (attrs:)

Attributes pertaining to the overall workflow are defined in the attrs: section of template.land_analysis.yaml under workflow:. For example:

workflow:
  attrs:
    realtime: false
    scheduler: slurm
    cyclethrottle: 24
    taskthrottle: 24
realtime: (Default: false)

Indicates whether it is a realtime run (true) or a retrospective run (false). Valid values: true | false

scheduler: (Default: slurm)

The job scheduler to use on the specified machine. Valid values: "slurm". Other options may work with a container but have not been tested: "pbspro" | "lsf" | "lsfcray" | "none"

cyclethrottle: (Default: 24)

The number of cycles that can be active at one time. Valid values: Integers > 0.

taskthrottle: (Default: 24)

The number of tasks that can be active at one time. Valid values: Integers > 0.

3.1.2. Workflow Cycle Definition (cycledef)

Cycling information is defined in the cycledef: section under workflow:. Each cycle definition starts with a hyphen (-) and has information on cycle attributes (attrs:) and a cycle specification (spec:).

workflow:
   cycledef:
    - attrs:
        group: cycled
      spec: {{ DATE_FIRST_CYCLE }}00 {{ DATE_LAST_CYCLE }}00 {{ DATE_CYCLE_FREQ_HR }}:00:00
    - attrs:
        group: first_cycle
      spec: {{ DATE_FIRST_CYCLE }}00 {{ DATE_FIRST_CYCLE }}00 {{ DATE_CYCLE_FREQ_HR }}:00:00
    - attrs:
        group: cycled_from_second
      spec: {{ date_second_cycle }}00 {{ DATE_LAST_CYCLE }}00 {{ DATE_CYCLE_FREQ_HR }}:00:00
attrs:

Attributes of cycledef. Includes group: but may also include activation_offset:. See the Rocoto Documentation for more information.

group:

The group attribute allows users to assign a set of cycles to a particular group. The group tag can later be used to control which tasks are run for which cycles. See the Rocoto Documentation for more information.

spec:

The cycle is defined using the “start stop step” method, with the cycle start date listed first in YYYYMMDDHHmm format, followed by the end date and then the step in HH:mm:SS format (e.g., 202501190000 202501220000 24:00:00). The template.land_analysis.yaml values are rendered with the user-provided cycle information in the config.yaml file; DATE_FIRST_CYCLE:, DATE_LAST_CYCLE:, and DATE_CYCLE_FREQ_HR: are defined in the Workflow Entities section below.

date_second_cycle:

Start date of subsequent cycle(s), derived in setup_wflow_env.py.

For example, a land_analysis.yaml file generated by setup_wflow_env.py on Hercules might be rendered as:

workflow:
  cycledef:
    - attrs:
        group: cycled
      spec: 202501190000 202501220000 24:00:00
    - attrs:
        group: first_cycle
      spec: 202501190000 202501190000 24:00:00
    - attrs:
        group: cycled_from_second
      spec: 202501200000 202501220000 24:00:00

3.1.3. Workflow Entities

In the land_analysis.yaml file, entities are constants that are referred to throughout the workflow using the ampersand (&) prefix and semicolon (;) suffix (e.g., &MACHINE;) to avoid defining the same constants repetitively in each workflow task. The entities: section of template.land_analysis.yaml provides the structure for this section of land_analysis.yaml. Then the default values for these entities are set in setup_wflow_env.py and updated with user-selected values from config.yaml. The resulting land_analysis.yaml file will include an entities: section with concrete values for several variables that are used throughout the workflow. For example, in a land_analysis.yaml file created on Hercules based on the config.LND.era5.3dvar.ims.DA-fcst.warmstart.yaml case, the following entities are defined:

workflow:
  entities:
    ACCOUNT: "epic"
    APP: "LND"
    ATM_IO_LAYOUT_X: "1"
    ATM_IO_LAYOUT_Y: "1"
    ATM_LAYOUT_X: "3"
    ATM_LAYOUT_Y: "8"
    ATMOS_FORC: "era5"
    BKG_ANAL_EXT_SRC_OPT: "era5land"
    CCPP_SUITE: "FV3_GFS_v17_p8_ugwpv1"
    COLDSTART: "NO"
    COMINgdas: ""
    COMINgfs: ""
    COUPLER_CALENDAR: "2"
    CUSTOM_JEDI_CONFIG_FLAG: "NO"
    CUSTOM_JEDI_CONFIG_PATH: "/work/noaa/epic/UFS_Land-DA_v3.0/inputs/test_base/jedi_yaml"
    CUSTOM_JEDI_CONFIG_PREFIX: "/prefix/of/custom/JEDI/config/file/name"
    DATE_CYCLE_FREQ_HR: "24"
    DATE_FIRST_CYCLE: "2025011900"
    DATE_LAST_CYCLE: "2025012000"
    DATM_STREAM_FN_LAST_DATE: ""
    DCOMINera5: ""
    DCOMINera5land: ""
    DCOMINghcn: ""
    DCOMINgswp3: ""
    DCOMINsmap: ""
    DCOMINsmops: ""
    DO_BKG_ANAL_EXT_SRC: "NO"
    DO_FREE_FORECAST: "NO"
    do_jedi_snow: "YES"
    do_jedi_soil_moisture: "NO"
    DT_ATMOS: "900"
    DT_RUNSEQ: "3600"
    envir: "lnd_era5_3dvar_ims"
    exp_basedir: "/Users/Joe.Schmoe/landda"
    EXP_CASE_NAME: "lnd_era5_3dvar_ims_00"
    FCSTHR: "24"
    FHROT: "0"
    FRAC_GRID: "NO"
    IC_DATA_MODEL: "gfs"
    IMO: "384"
    JEDI_ALGORITHM: "3dvar"
    JEDI_IODACONV_PATH: "/work/noaa/epic/UFS_Land-DA_v3.0/jedi_bundle_hercules/build/lib/python3.11"
    JEDI_PATH: "/work/noaa/epic/UFS_Land-DA_v3.0/jedi_bundle_hercules"
    JMO: "190"
    KEEPDATA: "YES"
    LND_CALC_SNET: ".true."
    LND_IC_TYPE: "custom"
    LND_INITIAL_ALBEDO: "0.25"
    LND_LAYOUT_X: "1"
    LND_LAYOUT_Y: "2"
    LND_OUTPUT_FREQ_SEC: "21600"
    MACHINE: "hercules"
    MED_COUPLING_MODE: "ufs.nfrac.aoflux"
    model_ver: "v3.0.0"
    native_default: "None"
    NET: "landda"
    NPROCS_ANALYSIS: "6"
    NPROCS_FCST_IC: "36"
    NPZ: "127"
    nnodes_forecast: "1"
    nprocs_forecast: "26"
    nprocs_forecast_atm: "12"
    nprocs_forecast_lnd: "12"
    nprocs_per_node: "26"
    OBSDIR: ""
    OBS_GHCN_SNOW: "NO"
    OBS_IMS_SNOW: "YES"
    OBS_SFCSNO: "YES"
    OBS_SMAP: "NO"
    OBS_SMOPS: "NO"
    OUTPUT_FH: "1 -1"
    partition_default: "hercules"
    PTMP: "/Users/Joe.Schmoe/landda/ptmp"
    PY_LOG_LEVEL: "INFO"
    queue_default: "batch"
    RES: "96"
    RESTART_INTERVAL: "12 -1"
    RUN: "landda"
    res_p1: "97"
    SCHED: "slurm"
    SMAP_RAW_WINDOW_SPAN_HALF: "5"
    WARMSTART_DIR: "/Users/Joe.Schmoe/landda/land-DA_workflow/fix/DATA_RESTART"
    WE2E_TEST: "NO"
    WE2E_ATOL: "1e-7"
    WE2E_LOG_FN: "we2e.log"
    WRITE_GROUPS: "1"
    WRITE_TASKS_PER_GROUP: "6"
    HOMElandda: "&exp_basedir;/land-DA_workflow"
    COMROOT: "&PTMP;/&envir;/com"
    DATAROOT: "&PTMP;/&envir;/tmp"
    LOGDIR: "&COMROOT;/output/logs"
    LOGFN_SUFFIX: "<cyclestr>_@Y@m@d@H.log</cyclestr>"
    PDY:  "<cyclestr>@Y@m@d</cyclestr>"
    cyc: "<cyclestr>@H</cyclestr>"
    DATADEP_LRST1: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>"
    DATADEP_LRST2: "<cyclestr>&WARMSTART_DIR;/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>"
    DATADEP_COLDSTART: "<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_skip_coldstart_@Y@m@d@H.txt</cyclestr>"
    DATADEP_DATM1: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>"
    DATADEP_DATM2: "<cyclestr>&WARMSTART_DIR;/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>"
    DATADEP_FREEFCST: "<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_analysis_done_@Y@m@d@H.txt</cyclestr>"
    DATADEP_SFC1: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>"
    DATADEP_SFC2: "<cyclestr>&WARMSTART_DIR;/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>"

Note

The workflow entities include certain standard environment variables that are defined in the NCEP Central Operations WCOSS Implementation Standards document (pp. 4-5). These variables are used in forming the path to various directories containing input, output, and workflow files. For a visual aid, see the Land DA Directory Structure Diagram.

ACCOUNT:

An account where users can charge their compute resources on the specified MACHINE. To determine an appropriate ACCOUNT field on a system with a Slurm job scheduler, users may run the saccount_params command to display account details. On other systems, users may run the groups command, which will return a list of projects that the user has permissions for. Not all of the listed projects/groups have an HPC allocation, but those that do are potentially valid account names.

APP:

Application/configuration to use. Valid values: LND | ATML.

ATM_IO_LAYOUT_X:

Specifies how many MPI ranks to use in the X direction for input/output (I/O) to the atmospheric component.

ATM_IO_LAYOUT_Y:

Specifies how many MPI ranks to use in the Y direction for input/output (I/O) to the atmospheric component.

ATM_LAYOUT_X:

Number of processes in the X direction per tile for the atmospheric component.

ATM_LAYOUT_Y:

Number of processes in the Y direction per tile for the atmospheric component.

ATMOS_FORC:

Type of atmospheric forcing data used. Valid values: "era5" | "gswp3".

CCPP_SUITE:

The physics suite to use in the experiment (only relevant for ATML configurations, which have an active atmospheric component).

COLDSTART:

Flag that indicates whether the experiment is a coldstart experiment ("YES") or a warmstart experiment ("NO").

COMINgdas:

Output from the GDAS model, which can be used as input for a new forecast. See WCOSS Implementation Standards for information on operational data naming conventions.

COMINgfs:

Output from the GFS model, which can be used as input for a new forecast. See WCOSS Implementation Standards for information on operational data naming conventions.

COUPLER_CALENDAR:

Coupler calendar. Options: no_calendar=0, thirty_day_months=1, julian=2, gregorian=3, noleap=4

CUSTOM_JEDI_CONFIG_FLAG:

Whether to use a custom JEDI configuration file ("YES") or not ("NO"). If this parameter is set to "YES", in the configuration file config.yaml, the custom input file which is located at CUSTOM_JEDI_CONFIG_PATH will be used as the JEDI input file in the analysis task.

CUSTOM_JEDI_CONFIG_PATH:

Path to the custom JEDI configuration file. Valid values: "YES" | "NO".

CUSTOM_JEDI_CONFIG_PREFIX:

Prefix for the custom JEDI file. For example, if the file were named custom_jedi_2026022600.yaml, then the CUSTOM_JEDI_CONFIG_PREFIX is custom_jedi_. Note that the YAML file name should include the date for cycling; the prefix is everything before the cycle date.

DATE_CYCLE_FREQ_HR:

Cycling frequency (in integer hours).

DATE_FIRST_CYCLE:

Starting cycle date of the first forecast in the set of forecasts to run. Format is “YYYYMMDDHH”.

DATE_LAST_CYCLE:

Starting cycle date of the last forecast in the set of forecasts to run. Format is “YYYYMMDDHH”.

DATM_STREAM_FN_LAST_DATE:

The last date of a warmstart run. Requires a valid date in YYYYMMDDHH format. This variable is a temporary fix for a bug in the UFS WM. Restart files produced by the LND configuration contain a hard-coded DATM file list. If the file list does not match the namelist, the warmstart will fail. For example, if the user runs a coldstart forecast from day 1 to day 2, the restart file will contain information for days 1-2. If they then choose to run a warmstart forecast for days 3 to 4 with the restart file from the coldstart, it will fail even if the user puts days 3-4 into the DATM input namelist. To resolve this issue, days 1-4 must be added to the namelist of the coldstart even though it only runs for days 1-2.

DCOMINera5:

Path to directory containing ERA5 input data files. See WCOSS Implementation Standards for information on operational data naming conventions.

DCOMINera5land:

Variable used in testing. Unsupported for users at this time.

DCOMINghcn:

Path to directory containing GHCN input data files. See WCOSS Implementation Standards for information on operational data naming conventions.

DCOMINgswp3:

Path to directory containing GSWP3 input data files. See WCOSS Implementation Standards for information on operational data naming conventions.

DCOMINsmap:

Path to directory containing SMAP input data files. See WCOSS Implementation Standards for information on operational data naming conventions.

DCOMINsmops:

Path to directory containing SMOPS input data files. See WCOSS Implementation Standards for information on operational data naming conventions.

DO_BKG_ANAL_EXT_SRC:

Whether to use an external source file for the analysis. Only relevant when CUSTOM_JEDI_CONFIG_PATH: YES. Valid values: "YES" | "NO".

DO_FREE_FORECAST:

Whether to run a free forecast ("YES") or a DA forecast ("NO").

do_jedi_snow:

Whether to perform JEDI snow DA. Valid values: "YES" | "NO".

do_jedi_soil_moisture:

Whether to perform JEDI soil moisture DA. Valid values: "YES" | "NO".

DT_ATMOS:

The main integration time step of the atmospheric component of the UFS Weather Model (in seconds). This is the time step for the outermost atmospheric model loop and must be a positive integer value. It corresponds to the frequency at which the physics routines and the top level dynamics routine are called. (Note that one call to the top-level dynamics routine results in multiple calls to the horizontal dynamics, tracer transport, and vertical dynamics routines; see the FV3 dycore scientific documentation for details.)

DT_RUNSEQ:

Time interval of run sequence (coupling interval) between the model components of the UFS Weather Model (in seconds).

envir:

The run environment. Set to “test” during the initial testing phase, “para” when running in parallel (on a schedule), and “prod” in production. In operations, this is the operations root directory (aka $OPSROOT). For more on NCO-compliant directory structure, see the Note on NCO Standards.

exp_basedir:

The full path to the parent directory of land-DA_workflow (i.e., ${BASEDIR} in the documentation). The actual value is derived in the setup_wflow_env.py file.

EXP_CASE_NAME:

A name for the experiment. This variable can be changed to any name the user wants (but note that whitespace and some punctuation characters are not allowed). However, the best names will indicate useful information about the experiment. Each of the sample cases provided sets the experiment name to app_[forcing_]starttype_## where <app> is the configuration (LND or ATML), <forcing> refers to the atmospheric forcing data used (if any), and <starttype> indicates either a warmstart or coldstart forecast.

FCSTHR:

Specifies the length of each forecast in hours. Valid values: Integers > 0.

FHROT:

Forecast hour at restart in UFS Weather Model (in hours; set in model_configure).

FRAC_GRID:

Flag for the fractional grid option in UFS_UTILS and the UFS WM. When the fractional grid option (frac_grid) was introduced in 2024, some variable names such as snow depth were changed in the WM and UFS_UTILS. However, these variable names were not changed in the Noah-MP land model component. The tile2tile converter uses this flag to switch variable names between JEDI and the land model. When fractional grid is enabled (FRAC_GRID: "YES"), two key variable names do not match between JEDI (sfc_data files) and the land model (restart files), and the tile2tile converter must translate between them:

Table 3.1 Mismatched Variable Names

Variable name in ‘tile2tile_converter’

Description

Noah-MP (restart)

JEDI (sfc_data)

swe

Snow water equivalent

weasd

sheleg / weasdl

snow_depth

Snow depth over land

snwdph

snwdph / snodl

In pre_anal, the title2tile converter creates the sfc_data files from the restart files for the analysis task. In post_anal, the title2tile converter creates the restart files for the warmstart forecast from the sfc_data and restart files for the forecast task.

IC_DATA_MODEL:

The name of the model that the initial sfc_data files are coming from in the fcst_ic task. Valid values: "gfs" | "gdas"

IMO:

Number of horizontal grid points in the X direction. Usually a multiple of the resolution (${RES}).

JEDI_ALGORITHM:

Data assimilation algorithm selection. Valid values: "letkf-oi" | "3dvar"

JEDI_IODACONV_PATH:

Path to directory where the libraries of the JEDI IODA converter are located.

JEDI_PATH:

Path to the directory where JEDI is installed. The actual value is set in a machine-specific portion of setup_wflow_env.py.

JMO:

Number of horizontal grid points in the Y direction.

KEEPDATA:

Flag to keep data ("YES") or not ("NO") that is copied to the $DATAROOT directory during the forecast experiment.

LND_CALC_SNET:

Flag indicating whether to calculate the shortwave radiation internally (".true.") or not (".false.").

LND_IC_TYPE:

Indicates the source of the initial conditions. Two options are supported: “custom” (i.e., C96.initial.tile[1-6].nc) and “sfc” (i.e., sfc_data.tile[1-6].nc). Valid values: custom | sfc.

LND_INITIAL_ALBEDO:

Initial mean surface albedo. Valid values: Any number between 0-1.

LND_LAYOUT_X:

Number of processes in the x direction per tile for the land model component.

LND_LAYOUT_Y:

Number of processes in the y direction per tile for the land model component.

LND_OUTPUT_FREQ_SEC:

Output frequency of the land model component (in seconds).

MACHINE:

The machine (a.k.a. platform or system) on which the workflow will run. The actual value is provided by the user via the -p=MACHINE command line argument or derived in setup_wflow_env.py from other parameters if possible. Currently supported platforms are listed in Section 1.2.2. Valid values: "ursa" | "hercules" | "orion" | "gaeac6"

MED_COUPLING_MODE:

CMEPS coupling mode. Valid values: "ufs.frac" | "ufs.nfrac.aoflux". "ufs.frac" is used with the active FV3 atmospheric component (e.g., in ATML configurations), whereas "ufs.nfrac.aoflux" is used with the data atmosphere component (e.g., LND configurations).

model_ver:

Version number of package in three digits (e.g., v#.#.#); second level of com directory (see NCO Directory Structure Entities)

native_default:

Defines raw batch system options/job scheduler commands that Rocoto will use when submitting jobs for a given task (using the <native> tag). If more than one option is required, they are listed consecutively as a single string. This is a machine-dependent parameter, so default values differ.

NET:

Model name (first level of com directory structure).

NPROCS_ANALYSIS:

Number of processors for the analysis task.

NPROCS_FCST_IC:

Number of processors for the fcst_ic task.

NPZ:

Number of vertical layers in the atmospheric model.

nnodes_forecast:

Number of nodes for the forecast task.

nprocs_forecast:

Total number of processes for the forecast task. In general, this is set as \(nprocs\_forecast\_lnd + nprocs\_forecast\_atm + (lnd\_layout\_x*lnd\_layout\_y)\).

nprocs_forecast_atm:

Number of processes for the atmospheric component in the forecast task. Actual default value dependent on APP (LND or ATML).

nprocs_forecast_lnd:

Number of processes for the land model component (Noah-MP) in the forecast task.

nprocs_per_node:

Number of processes per node for the forecast task. Actual default value dependent on nprocs_forecast and the maximum number of cores available per node.

OBSDIR:

The path to the directory where DA fix files are located. In scripts/exlandda_prep_data.sh, this value is set to ${FIXlandda}/DA_obs unless the user specifies a different path in config.yaml.

OBS_GHCN_SNOW:

Flag to use GHCN snow depth observations. Valid values: "YES" | "NO".

OBS_IMS_SNOW:

Flag to use IMS snow depth observations. Valid values: "YES" | "NO".

OBS_SFCSNO:

Flag to use SFCSNO snow depth observations. Valid values: "YES" | "NO".

OBS_SMAP:

Flag to use SMAP soil moisture observation data. Valid values: "YES" | "NO".

OBS_SMOPS:

Flag to use SMOPS soil moisture observation data. Valid values: "YES" | "NO".

OUTPUT_FH:

Forecast history file output frequency (when second number is -1, e.g., "1 -1") or hours at which to write output history files (e.g., "6 9 12").

partition_default:

Default partition; default set based on MACHINE.

PY_LOG_LEVEL:

Python logging level. Valid values: "INFO" | "DEBUG" | "WARN" | "ERROR" | "CRITICAL"

queue_default:

Default queue; default set based on MACHINE.

RES:

Resolution of FV3 grid. Currently, only C96 resolution is supported.

RESTART_INTERVAL:

Determines how often the model creates restart files, which are used to continue simulations from a specific point in time. When the second number is -1, the first number refers to the frequency of restart file output (e.g., "1 -1"). Otherwise, the list of numbers indicates specific hours at which to output restart files (e.g., "6 9 12").

RUN:

Name of model run (third level of com directory structure). In general, same as ${NET}.

res_p1:

Resolution plus 1 (${RES} + 1) . Must be an integer value.

SCHED:

The job scheduler to use (e.g., Slurm) on the specified MACHINE. Valid values: "slurm". Other options may work with a container but have not been tested: "pbspro" | "lsf" | "lsfcray" | "none"

SMAP_RAW_WINDOW_SPAN_HALF:

The SMAP satellite is designed to create a global map every 2-3 days. Each SMAP data file covers a narrow and long area of 1000 km width, and there can be overlap. To avoid duplication and cover as wide an area as possible, the data files between ${PDY}${cyc} +/- ${SMAP_RAW_WINDOW_SPAN_HALF} hours are combined after the raw data files are converted into the IODA format in the prep_data task. Its default value is 5. This means that 11-hour data sets are combined by default. For example, combined data for 2025011800 would contain the raw data files from 2025011719 to 2025011805. To use a single data set, set the configuration parameter to 0.

WARMSTART_DIR:

The path to restart files for a warmstart experiment. The actual value set is machine-dependent.

WE2E_TEST:

Flag to turn on the workflow end-to-end (WE2E) test. When WE2E_TEST="YES", the results files from the experiment are compared to the test baseline files, located by default in ${BASEDIR}/land-DA_workflow/fix/test_base/we2e_com. If the results are within the tolerance set (via WE2E_ATOL) at the end of the three main tasks — analysis, forecast, and post_anal — then the experiment passes. Valid values: "YES" | "NO"

WE2E_ATOL:

Tolerance of the WE2E test. (Set in template.land_analysis.yaml.)

WE2E_LOG_FN:

Name of the WE2E test log file. (Set in template.land_analysis.yaml.)

WRITE_GROUPS:

The number of write groups (i.e., groups of MPI tasks) to use.

WRITE_TASKS_PER_GROUP:

The number of MPI tasks to allocate for each of the ${WRITE_GROUPS}.

3.1.3.1. NCO Directory Structure Entities

Standard environment variables are defined in the NCEP Central Operations WCOSS Implementation Standards document (pp. 4-5). These variables are used in forming the path to various directories containing input, output, and workflow files. For a visual aid, see the Land DA Directory Structure Diagram.

HOMElandda: (Default: "&exp_basedir;/land-DA_workflow" )

The location of the land-DA_workflow clone.

PTMP: (Default: "&exp_basedir;/ptmp" )

Product temporary (PTMP) experiment output space. This directory is used to mimic the operational file structure and contains all of the files and subdirectories used by or generated by the experiment. By default, it is a sibling to the land-DA_workflow directory.

COMROOT: (Default: "&PTMP;/&envir;/com" )

com root directory, which contains input/output data on current system.

DATAROOT: (Default: "&PTMP;/&envir;/tmp" )

Directory location for the temporary working directories for running jobs. By default, this is a sibling to the ${COMROOT} directory and is located at ptmp/<envir>/tmp.

LOGDIR: (Default: "&COMROOT;/output/logs" )

Path to the directory containing log files for each workflow task.

LOGFN_SUFFIX: (Default: "<cyclestr>_@Y@m@d@H.log</cyclestr>" )

The cycle suffix appended to each task’s log file. It will be rendered in the form _YYYYMMDDHH.log. For example, the prep_obs task log file for the Jan. 20, 2025 00z cycle would be named: prep_obs_2025012000.log.

PDY: (Default: "<cyclestr>@Y@m@d</cyclestr>" )

Date in YYYYMMDD format.

cyc: (Default: "<cyclestr>@H</cyclestr>" )

Cycle time in GMT hours, formatted HH.

3.1.3.2. Data Location Entities

DATADEP_LRST1: (Default: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>" )

Land model (Noah-MP) restart files for the next cycle.

DATADEP_LRST2: (Default: "<cyclestr>&WARMSTART_DIR;/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>" )

Land model (Noah-MP) restart files used to initialize a warmstart experiment.

DATADEP_COLDSTART: (Default: "<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_skip_coldstart_@Y@m@d@H.txt</cyclestr>" )

File to skip the cold-start tasks.

DATADEP_DATM1: (Default: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>" )

DATM restart files for the next cycle.

DATADEP_DATM2: (Default: "<cyclestr>&WARMSTART_DIR;/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>" )

DATM restart files used to initialize a warmstart experiment.

DATADEP_SFC1: (Default: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>" )

Surface data (sfc_data) restart files for the next cycle.

DATADEP_SFC2: (Default: "<cyclestr>&WARMSTART_DIR;/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>" )

Surface data (sfc_data) files used to initialize a warmstart experiment.

DATADEP_FREEFCST: “<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_analysis_done_@Y@m@d@H.txt</cyclestr>”

Data file(s) required to trigger the forecast task in a free-forecast experiment.

3.1.4. Workflow Log

Information related to overall workflow progress is defined in the log: section under workflow:

workflow:
  log: "&LOGDIR;/workflow.log"
log: (Default: "&LOGDIR;/workflow.log")

Path and name of Rocoto log file(s).

3.1.5. Workflow Tasks

The workflow is divided into discrete tasks, and details of each task are defined within the tasks: section under workflow:.

workflow:
  tasks:
    task_jcb:
    task_prep_data:
    task_fcst_ic:
    task_pre_anal:
    task_analysis:
    task_post_anal:
    task_forecast:
    task_plot_stats:

Each task may contain attributes (attrs:), just as in the overarching workflow: section. Instead of entities, each task contains an envars: section to define environment variables that must be passed to the task when it is executed. Any task dependencies are listed under the dependency: section. Additional details, such as jobname:, walltime:, and queue: may also be set within a specific task.

The following subsections explain any variables that have not already been explained/defined above.

3.1.5.1. Sample Task: Analysis Task (task_analysis)

This section walks users through the structure of the analysis task (task_analysis) to explain how configuration information is provided to the land_analysis.yaml file for each task. Since each task has a similar structure, common information is explained in this section. Variables unique to a particular task are defined in their respective task_* sections based on the structure laid out in template.land_analysis.yaml.

Parameters for a particular task are set in the workflow.tasks.task_<name>: section of the template.land_analysis.yaml file. For example, settings for the analysis task are provided in the task_analysis: section of template.land_analysis.yaml. The following is an excerpt of the task_analysis: section of template.land_analysis.yaml:

workflow:
  tasks:
    task_analysis:
      attrs:
{%- if COLDSTART == "YES" %}
        cycledefs: cycled_from_second
{%- else %}
        cycledefs: cycled
{%- endif %}
        maxtries: 2
      envars:
        ACCOUNT: "&ACCOUNT;"
        BKG_ANAL_EXT_SRC_OPT: "&BKG_ANAL_EXT_SRC_OPT;"
        COMINgfs: "&COMINgfs;"
        COMROOT: "&COMROOT;"
        COUPLER_CALENDAR: "&COUPLER_CALENDAR;"
        CUSTOM_JEDI_CONFIG_FLAG: "&CUSTOM_JEDI_CONFIG_FLAG;"
        CUSTOM_JEDI_CONFIG_PATH: "&CUSTOM_JEDI_CONFIG_PATH;"
        CUSTOM_JEDI_CONFIG_PREFIX: "&CUSTOM_JEDI_CONFIG_PREFIX;"
        cyc: "&cyc;"
        DATAROOT: "&DATAROOT;"
        DATE_CYCLE_FREQ_HR: "&DATE_CYCLE_FREQ_HR;"
        DATE_FIRST_CYCLE: "&DATE_FIRST_CYCLE;"
        DCOMINera5land: "&DCOMINera5land;"
        DO_BKG_ANAL_EXT_SRC: "&DO_BKG_ANAL_EXT_SRC;"
        DO_FREE_FORECAST: "&DO_FREE_FORECAST;"
        do_jedi_snow: "&do_jedi_snow;"
        do_jedi_soil_moisture: "&do_jedi_soil_moisture;"
        exp_basedir: "&exp_basedir;"
        EXP_CASE_NAME: "&EXP_CASE_NAME;"
        FRAC_GRID: "&FRAC_GRID;"
        HOMElandda: "&HOMElandda;"
        JEDI_ALGORITHM: "&JEDI_ALGORITHM;"
        JEDI_PATH: "&JEDI_PATH;"
        KEEPDATA: "&KEEPDATA;"
        LOGDIR: "&LOGDIR;"
        MACHINE: "&MACHINE;"
        model_ver: "&model_ver;"
        NPROCS_ANALYSIS: "&NPROCS_ANALYSIS;"
        NPZ: "&NPZ;"
        OBS_GHCN_SNOW: "&OBS_GHCN_SNOW;"
        OBS_IMS_SNOW: "&OBS_IMS_SNOW;"
        OBS_SFCSNO: "&OBS_SFCSNO;"
        OBS_SMAP: "&OBS_SMAP;"
        OBS_SMOPS: "&OBS_SMOPS;"
        PDY: "&PDY;"
        PY_LOG_LEVEL: "&PY_LOG_LEVEL;"
        RES: "&RES;"
        res_p1: "&res_p1;"
        SCHED: "&SCHED;"
        WARMSTART_DIR: "&WARMSTART_DIR;"
        WE2E_TEST: "&WE2E_TEST;"
        WE2E_ATOL: "&WE2E_ATOL;"
        WE2E_LOG_FN: "&WE2E_LOG_FN;"
      account: "&ACCOUNT;"
      command: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"'
      jobname: analysis
      nodes: "1:ppn=&NPROCS_ANALYSIS;"
{%- if native_default is not none %}
      native: "&native_default;"
{%- endif %}
      walltime: 00:15:00
      partition: "&partition_default;"
      queue: "&queue_default;"
      join: "&LOGDIR;/analysis&LOGFN_SUFFIX;"
{%- if MACHINE == "ursa" %}
      memory: 32G
{%- endif %}
      dependency:
        and:
          taskdep_prep_data:
            attrs:
              task: prep_data
{%- if CUSTOM_JEDI_CONFIG_FLAG == "NO" %}
          taskdep_jcb:
            attrs:
              task: jcb
{%- endif %}
{%- if APP == "LND" %}
          taskdep_pre_anal:
            attrs:
              task: pre_anal
{%- else %}
          or:
            datadep_sfc1:
              attrs:
                age: 5
              value: "&DATADEP_SFC1;"
            datadep_sfc2:
              attrs:
                age: 5
              value: "&DATADEP_SFC2;"
{%- endif %}

When running the config.LND.era5.3dvar.ims.DA-fcst.warmstart.yaml case on Hercules, the analysis task from land_analysis.yaml file would render as follows:

task_analysis:
  attrs:
    cycledefs: cycled
    maxtries: 2
  envars:
    ACCOUNT: "&ACCOUNT;"
    BKG_ANAL_EXT_SRC_OPT: "&BKG_ANAL_EXT_SRC_OPT;"
    COMINgfs: "&COMINgfs;"
    COMROOT: "&COMROOT;"
    COUPLER_CALENDAR: "&COUPLER_CALENDAR;"
    CUSTOM_JEDI_CONFIG_FLAG: "&CUSTOM_JEDI_CONFIG_FLAG;"
    CUSTOM_JEDI_CONFIG_PATH: "&CUSTOM_JEDI_CONFIG_PATH;"
    CUSTOM_JEDI_CONFIG_PREFIX: "&CUSTOM_JEDI_CONFIG_PREFIX;"
    cyc: "&cyc;"
    DATAROOT: "&DATAROOT;"
    DATE_CYCLE_FREQ_HR: "&DATE_CYCLE_FREQ_HR;"
    DATE_FIRST_CYCLE: "&DATE_FIRST_CYCLE;"
    DCOMINera5land: "&DCOMINera5land;"
    DO_BKG_ANAL_EXT_SRC: "&DO_BKG_ANAL_EXT_SRC;"
    DO_FREE_FORECAST: "&DO_FREE_FORECAST;"
    do_jedi_snow: "&do_jedi_snow;"
    do_jedi_soil_moisture: "&do_jedi_soil_moisture;"
    exp_basedir: "&exp_basedir;"
    EXP_CASE_NAME: "&EXP_CASE_NAME;"
    FRAC_GRID: "&FRAC_GRID;"
    HOMElandda: "&HOMElandda;"
    JEDI_ALGORITHM: "&JEDI_ALGORITHM;"
    JEDI_PATH: "&JEDI_PATH;"
    KEEPDATA: "&KEEPDATA;"
    LOGDIR: "&LOGDIR;"
    MACHINE: "&MACHINE;"
    model_ver: "&model_ver;"
    NPROCS_ANALYSIS: "&NPROCS_ANALYSIS;"
    NPZ: "&NPZ;"
    OBS_GHCN_SNOW: "&OBS_GHCN_SNOW;"
    OBS_IMS_SNOW: "&OBS_IMS_SNOW;"
    OBS_SFCSNO: "&OBS_SFCSNO;"
    OBS_SMAP: "&OBS_SMAP;"
    OBS_SMOPS: "&OBS_SMOPS;"
    PDY: "&PDY;"
    PY_LOG_LEVEL: "&PY_LOG_LEVEL;"
    RES: "&RES;"
    res_p1: "&res_p1;"
    SCHED: "&SCHED;"
    WARMSTART_DIR: "&WARMSTART_DIR;"
    WE2E_TEST: "&WE2E_TEST;"
    WE2E_ATOL: "&WE2E_ATOL;"
    WE2E_LOG_FN: "&WE2E_LOG_FN;"
  account: "&ACCOUNT;"
  command: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"'
  jobname: analysis
  nodes: "1:ppn=&NPROCS_ANALYSIS;"
  walltime: 00:15:00
  partition: "&partition_default;"
  queue: "&queue_default;"
  join: "&LOGDIR;/analysis&LOGFN_SUFFIX;"
  dependency:
    and:
      taskdep_prep_data:
        attrs:
          task: prep_data
      taskdep_jcb:
        attrs:
          task: jcb
      taskdep_pre_anal:
        attrs:
          task: pre_anal

3.1.5.1.1. Task Attributes (attrs:)

The attrs: section for each task includes the cycledefs: attribute and the maxtries: attribute.

cycledefs: (Default: cycled)

A comma-separated list of cycledef: group names. A task with a cycledefs: group ID will be run only if its group ID matches one of the workflow’s cycledef: group IDs. In this case, the cycledef: attribute is part of a conditional statement. If the user is running a coldstart experiment, the cycledef: group name will be cycled_from_second because the model needs time to “spin up” before cycling can begin; otherwise, the group name will be cycled.

maxtries: (Default: 2)

The maximum number of times Rocoto can resumbit a failed task.

3.1.5.1.2. Task Environment Variables (envars)

The envars: section for each task reuses many of the same variables and values defined as entities: for the overall workflow. These values are needed for each task, but setting them individually is error-prone. Instead, a specific workflow task can reference workflow entities using the &VAR; syntax. For example, to set the ACCOUNT: value in task_analysis: to the value of the workflow ACCOUNT: entity, the following statement can be added to the task’s envars: section:

task_analysis:
   envars:
     ACCOUNT: "&ACCOUNT;"

For most workflow tasks, whatever value is set in the workflow.entities: section should be reused/referenced in other tasks. For example, the MACHINE variable must be defined for each task, and users cannot switch machines mid-workflow. Therefore, users should set the MACHINE variable in the workflow.entities: section and reference that definition in each workflow task. For example:

workflow:
  entities:
    MACHINE: "hercules"
  tasks:
    task_jcb:
      envars:
        MACHINE: "&MACHINE;"
    task_prep_data:
      envars:
        MACHINE: "&MACHINE;"
    ...
    task_forecast:
      envars:
        MACHINE: "&MACHINE;"
    task_plot_stats:
      envars:
        MACHINE: "&MACHINE;"

3.1.5.1.3. Miscellaneous Task Values

The authoritative Rocoto documentation discusses a number of miscellaneous task attributes in detail. A brief overview is provided in this section.

workflow:
  tasks:
    task_analysis:
      account: "&ACCOUNT;"
      command: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"'
      jobname: analysis
      nodes: "1:ppn=&NPROCS_ANALYSIS;"
      walltime: 00:15:00
      partition: "&partition_default;"
      queue: "&queue_default;"
      join: "&LOGDIR;/analysis&LOGFN_SUFFIX;"
account: (Default: "&ACCOUNT;" )

An account where users can charge their compute resources on the specified MACHINE. This value is typically the same for each task, so the default is to reuse the value set in the Workflow Entities section.

Note

The account variable (lowercase) is used by the job scheduler (Slurm), whereas the ACCOUNT variable (uppercase) is an envar referenced by the workflow scripts (e.g., scripts, jjobs, ush).

command: (Default: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"' )

The command that Rocoto will submit to the batch system to carry out the task’s work.

jobname:

Name of the task/job (default will vary based on the task).

nodes:

Number of nodes required for the task (default will vary based on the task).

walltime:

Time allotted for the task (default will vary based on the task).

partition: (Default: "&partition_default;" )

The HPC system partition to run on.

queue: (Default: "&queue_default;" )

The batch system queue or “quality of service” (QOS) that Rocoto will submit the task to for execution.

join: (Default: “&LOGDIR;/analysis&LOGFN_SUFFIX;”)

The full path to the task’s log file, which records output from stdout and stderr.

Some tasks include a cores: value instead of a nodes: value. For example:

cores: (Default: 1)

The number of cores required for the task.

Some tasks include a native: value, usually set to "&native_default;"; whether this value is listed is machine-dependent.

Some tasks include a memory: tag, with a default value of 128G.

3.1.5.1.4. Dependencies

The dependency: section of a task defines what prerequisites (task or data-related) must be met for the task to run. In the case of task_analysis:, it must be run after the jcb and prep_data tasks. Additionally, when running the LND configuration, it must be run after the pre_anal task. Therefore, the dependecy section lists these task dependencies (taskdep_*:). When running in the ATML configuration, the pre_anal task is not required, but one of the two surface data files (datadep_sfc[1/2]) is required as a restart file for the next cycle.

workflow:
  tasks:
    task_analysis:
      dependency:
        and:
          taskdep_prep_data:
            attrs:
              task: prep_data
{%- if CUSTOM_JEDI_CONFIG_FLAG == "NO" %}
          taskdep_jcb:
            attrs:
              task: jcb
{%- endif %}
{%- if APP == "LND" %}
          taskdep_pre_anal:
            attrs:
              task: pre_anal
{%- else %}
          or:
            datadep_sfc1:
              attrs:
                age: 5
              value: "&DATADEP_SFC1;"
            datadep_sfc2:
              attrs:
                age: 5
              value: "&DATADEP_SFC2;"
{%- endif %}

For details on dependencies (e.g., attrs:, age:, value: tags), view the authoritative Rocoto documentation.