3.1. Available Workflow Configuration Parameters
Among other tasks, the setup workflow Python script (parm/setup_wflow_env.py) generates a land_analysis.yaml file that contains all of the settings for the experiment — user-selected settings from config.yaml, default values, and machine-dependent settings. The script also uses the uwtools Python package to generate a Rocoto XML file using the uwtools Python package.
The template.land_analysis.yaml file contains all parameters that can ultimately be included in the land_analysis.yaml file and workflow XML. setup_wflow_env.py first sets default values for all parameters necessary for the experiment. It sets machine-specific values based on the user’s platform. Then the user-specified values from config.yaml are loaded and override any previously-set defaults. Finally, the script generates an experiment directory containing the land_analysis.yaml file and the Rocoto XML.
Fig. 3.1 Overview of the setup_wflow_env.py script
3.1.1. Workflow Attributes (attrs:)
Attributes pertaining to the overall workflow are defined in the attrs: section of template.land_analysis.yaml under workflow:. For example:
workflow:
attrs:
realtime: false
scheduler: slurm
cyclethrottle: 24
taskthrottle: 24
realtime:(Default: false)Indicates whether it is a realtime run (true) or a retrospective run (false). Valid values:
true|falsescheduler:(Default: slurm)The job scheduler to use on the specified machine. Valid values:
"slurm". Other options may work with a container but have not been tested:"pbspro"|"lsf"|"lsfcray"|"none"cyclethrottle:(Default: 24)The number of cycles that can be active at one time. Valid values: Integers > 0.
taskthrottle:(Default: 24)The number of tasks that can be active at one time. Valid values: Integers > 0.
3.1.2. Workflow Cycle Definition (cycledef)
Cycling information is defined in the cycledef: section under workflow:. Each cycle definition starts with a hyphen (-) and has information on cycle attributes (attrs:) and a cycle specification (spec:).
workflow:
cycledef:
- attrs:
group: cycled
spec: {{ DATE_FIRST_CYCLE }}00 {{ DATE_LAST_CYCLE }}00 {{ DATE_CYCLE_FREQ_HR }}:00:00
- attrs:
group: first_cycle
spec: {{ DATE_FIRST_CYCLE }}00 {{ DATE_FIRST_CYCLE }}00 {{ DATE_CYCLE_FREQ_HR }}:00:00
- attrs:
group: cycled_from_second
spec: {{ date_second_cycle }}00 {{ DATE_LAST_CYCLE }}00 {{ DATE_CYCLE_FREQ_HR }}:00:00
attrs:Attributes of
cycledef. Includesgroup:but may also includeactivation_offset:. See the Rocoto Documentation for more information.group:The group attribute allows users to assign a set of cycles to a particular group. The group tag can later be used to control which tasks are run for which cycles. See the Rocoto Documentation for more information.
spec:The cycle is defined using the “start stop step” method, with the cycle start date listed first in YYYYMMDDHHmm format, followed by the end date and then the step in HH:mm:SS format (e.g.,
202501190000 202501220000 24:00:00). Thetemplate.land_analysis.yamlvalues are rendered with the user-provided cycle information in theconfig.yamlfile;DATE_FIRST_CYCLE:,DATE_LAST_CYCLE:, andDATE_CYCLE_FREQ_HR:are defined in the Workflow Entities section below.date_second_cycle:Start date of subsequent cycle(s), derived in
setup_wflow_env.py.
For example, a land_analysis.yaml file generated by setup_wflow_env.py on Hercules might be rendered as:
workflow:
cycledef:
- attrs:
group: cycled
spec: 202501190000 202501220000 24:00:00
- attrs:
group: first_cycle
spec: 202501190000 202501190000 24:00:00
- attrs:
group: cycled_from_second
spec: 202501200000 202501220000 24:00:00
3.1.3. Workflow Entities
In the land_analysis.yaml file, entities are constants that are referred to throughout the workflow using the ampersand (&) prefix and semicolon (;) suffix (e.g., &MACHINE;) to avoid defining the same constants repetitively in each workflow task. The entities: section of template.land_analysis.yaml provides the structure for this section of land_analysis.yaml. Then the default values for these entities are set in setup_wflow_env.py and updated with user-selected values from config.yaml. The resulting land_analysis.yaml file will include an entities: section with concrete values for several variables that are used throughout the workflow. For example, in a land_analysis.yaml file created on Hercules based on the config.LND.era5.3dvar.ims.DA-fcst.warmstart.yaml case, the following entities are defined:
workflow:
entities:
ACCOUNT: "epic"
APP: "LND"
ATM_IO_LAYOUT_X: "1"
ATM_IO_LAYOUT_Y: "1"
ATM_LAYOUT_X: "3"
ATM_LAYOUT_Y: "8"
ATMOS_FORC: "era5"
BKG_ANAL_EXT_SRC_OPT: "era5land"
CCPP_SUITE: "FV3_GFS_v17_p8_ugwpv1"
COLDSTART: "NO"
COMINgdas: ""
COMINgfs: ""
COUPLER_CALENDAR: "2"
CUSTOM_JEDI_CONFIG_FLAG: "NO"
CUSTOM_JEDI_CONFIG_PATH: "/work/noaa/epic/UFS_Land-DA_v3.0/inputs/test_base/jedi_yaml"
CUSTOM_JEDI_CONFIG_PREFIX: "/prefix/of/custom/JEDI/config/file/name"
DATE_CYCLE_FREQ_HR: "24"
DATE_FIRST_CYCLE: "2025011900"
DATE_LAST_CYCLE: "2025012000"
DATM_STREAM_FN_LAST_DATE: ""
DCOMINera5: ""
DCOMINera5land: ""
DCOMINghcn: ""
DCOMINgswp3: ""
DCOMINsmap: ""
DCOMINsmops: ""
DO_BKG_ANAL_EXT_SRC: "NO"
DO_FREE_FORECAST: "NO"
do_jedi_snow: "YES"
do_jedi_soil_moisture: "NO"
DT_ATMOS: "900"
DT_RUNSEQ: "3600"
envir: "lnd_era5_3dvar_ims"
exp_basedir: "/Users/Joe.Schmoe/landda"
EXP_CASE_NAME: "lnd_era5_3dvar_ims_00"
FCSTHR: "24"
FHROT: "0"
FRAC_GRID: "NO"
IC_DATA_MODEL: "gfs"
IMO: "384"
JEDI_ALGORITHM: "3dvar"
JEDI_IODACONV_PATH: "/work/noaa/epic/UFS_Land-DA_v3.0/jedi_bundle_hercules/build/lib/python3.11"
JEDI_PATH: "/work/noaa/epic/UFS_Land-DA_v3.0/jedi_bundle_hercules"
JMO: "190"
KEEPDATA: "YES"
LND_CALC_SNET: ".true."
LND_IC_TYPE: "custom"
LND_INITIAL_ALBEDO: "0.25"
LND_LAYOUT_X: "1"
LND_LAYOUT_Y: "2"
LND_OUTPUT_FREQ_SEC: "21600"
MACHINE: "hercules"
MED_COUPLING_MODE: "ufs.nfrac.aoflux"
model_ver: "v3.0.0"
native_default: "None"
NET: "landda"
NPROCS_ANALYSIS: "6"
NPROCS_FCST_IC: "36"
NPZ: "127"
nnodes_forecast: "1"
nprocs_forecast: "26"
nprocs_forecast_atm: "12"
nprocs_forecast_lnd: "12"
nprocs_per_node: "26"
OBSDIR: ""
OBS_GHCN_SNOW: "NO"
OBS_IMS_SNOW: "YES"
OBS_SFCSNO: "YES"
OBS_SMAP: "NO"
OBS_SMOPS: "NO"
OUTPUT_FH: "1 -1"
partition_default: "hercules"
PTMP: "/Users/Joe.Schmoe/landda/ptmp"
PY_LOG_LEVEL: "INFO"
queue_default: "batch"
RES: "96"
RESTART_INTERVAL: "12 -1"
RUN: "landda"
res_p1: "97"
SCHED: "slurm"
SMAP_RAW_WINDOW_SPAN_HALF: "5"
WARMSTART_DIR: "/Users/Joe.Schmoe/landda/land-DA_workflow/fix/DATA_RESTART"
WE2E_TEST: "NO"
WE2E_ATOL: "1e-7"
WE2E_LOG_FN: "we2e.log"
WRITE_GROUPS: "1"
WRITE_TASKS_PER_GROUP: "6"
HOMElandda: "&exp_basedir;/land-DA_workflow"
COMROOT: "&PTMP;/&envir;/com"
DATAROOT: "&PTMP;/&envir;/tmp"
LOGDIR: "&COMROOT;/output/logs"
LOGFN_SUFFIX: "<cyclestr>_@Y@m@d@H.log</cyclestr>"
PDY: "<cyclestr>@Y@m@d</cyclestr>"
cyc: "<cyclestr>@H</cyclestr>"
DATADEP_LRST1: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>"
DATADEP_LRST2: "<cyclestr>&WARMSTART_DIR;/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>"
DATADEP_COLDSTART: "<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_skip_coldstart_@Y@m@d@H.txt</cyclestr>"
DATADEP_DATM1: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>"
DATADEP_DATM2: "<cyclestr>&WARMSTART_DIR;/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>"
DATADEP_FREEFCST: "<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_analysis_done_@Y@m@d@H.txt</cyclestr>"
DATADEP_SFC1: "<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>"
DATADEP_SFC2: "<cyclestr>&WARMSTART_DIR;/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>"
Note
The workflow entities include certain standard environment variables that are defined in the NCEP Central Operations WCOSS Implementation Standards document (pp. 4-5). These variables are used in forming the path to various directories containing input, output, and workflow files. For a visual aid, see the Land DA Directory Structure Diagram.
ACCOUNT:An account where users can charge their compute resources on the specified
MACHINE. To determine an appropriateACCOUNTfield on a system with a Slurm job scheduler, users may run thesaccount_paramscommand to display account details. On other systems, users may run thegroupscommand, which will return a list of projects that the user has permissions for. Not all of the listed projects/groups have an HPC allocation, but those that do are potentially valid account names.APP:Application/configuration to use. Valid values:
LND|ATML.ATM_IO_LAYOUT_X:Specifies how many MPI ranks to use in the X direction for input/output (I/O) to the atmospheric component.
ATM_IO_LAYOUT_Y:Specifies how many MPI ranks to use in the Y direction for input/output (I/O) to the atmospheric component.
ATM_LAYOUT_X:Number of processes in the X direction per tile for the atmospheric component.
ATM_LAYOUT_Y:Number of processes in the Y direction per tile for the atmospheric component.
ATMOS_FORC:Type of atmospheric forcing data used. Valid values:
"era5"|"gswp3".CCPP_SUITE:The physics suite to use in the experiment (only relevant for ATML configurations, which have an active atmospheric component).
COLDSTART:Flag that indicates whether the experiment is a coldstart experiment (
"YES") or a warmstart experiment ("NO").COMINgdas:Output from the GDAS model, which can be used as input for a new forecast. See WCOSS Implementation Standards for information on operational data naming conventions.
COMINgfs:Output from the GFS model, which can be used as input for a new forecast. See WCOSS Implementation Standards for information on operational data naming conventions.
COUPLER_CALENDAR:Coupler calendar. Options:
no_calendar=0,thirty_day_months=1,julian=2,gregorian=3,noleap=4CUSTOM_JEDI_CONFIG_FLAG:Whether to use a custom JEDI configuration file (
"YES") or not ("NO"). If this parameter is set to"YES", in the configuration fileconfig.yaml, the custom input file which is located atCUSTOM_JEDI_CONFIG_PATHwill be used as the JEDI input file in the analysis task.CUSTOM_JEDI_CONFIG_PATH:Path to the custom JEDI configuration file. Valid values:
"YES"|"NO".CUSTOM_JEDI_CONFIG_PREFIX:Prefix for the custom JEDI file. For example, if the file were named
custom_jedi_2026022600.yaml, then theCUSTOM_JEDI_CONFIG_PREFIXiscustom_jedi_. Note that the YAML file name should include the date for cycling; the prefix is everything before the cycle date.DATE_CYCLE_FREQ_HR:Cycling frequency (in integer hours).
DATE_FIRST_CYCLE:Starting cycle date of the first forecast in the set of forecasts to run. Format is “YYYYMMDDHH”.
DATE_LAST_CYCLE:Starting cycle date of the last forecast in the set of forecasts to run. Format is “YYYYMMDDHH”.
DATM_STREAM_FN_LAST_DATE:The last date of a warmstart run. Requires a valid date in
YYYYMMDDHHformat. This variable is a temporary fix for a bug in the UFS WM. Restart files produced by the LND configuration contain a hard-coded DATM file list. If the file list does not match the namelist, the warmstart will fail. For example, if the user runs a coldstart forecast from day 1 to day 2, the restart file will contain information for days 1-2. If they then choose to run a warmstart forecast for days 3 to 4 with the restart file from the coldstart, it will fail even if the user puts days 3-4 into the DATM input namelist. To resolve this issue, days 1-4 must be added to the namelist of the coldstart even though it only runs for days 1-2.DCOMINera5:Path to directory containing ERA5 input data files. See WCOSS Implementation Standards for information on operational data naming conventions.
DCOMINera5land:Variable used in testing. Unsupported for users at this time.
DCOMINghcn:Path to directory containing GHCN input data files. See WCOSS Implementation Standards for information on operational data naming conventions.
DCOMINgswp3:Path to directory containing GSWP3 input data files. See WCOSS Implementation Standards for information on operational data naming conventions.
DCOMINsmap:Path to directory containing SMAP input data files. See WCOSS Implementation Standards for information on operational data naming conventions.
DCOMINsmops:Path to directory containing SMOPS input data files. See WCOSS Implementation Standards for information on operational data naming conventions.
DO_BKG_ANAL_EXT_SRC:Whether to use an external source file for the analysis. Only relevant when
CUSTOM_JEDI_CONFIG_PATH: YES. Valid values:"YES"|"NO".DO_FREE_FORECAST:Whether to run a free forecast (
"YES") or a DA forecast ("NO").do_jedi_snow:Whether to perform JEDI snow DA. Valid values:
"YES"|"NO".do_jedi_soil_moisture:Whether to perform JEDI soil moisture DA. Valid values:
"YES"|"NO".DT_ATMOS:The main integration time step of the atmospheric component of the UFS Weather Model (in seconds). This is the time step for the outermost atmospheric model loop and must be a positive integer value. It corresponds to the frequency at which the physics routines and the top level dynamics routine are called. (Note that one call to the top-level dynamics routine results in multiple calls to the horizontal dynamics, tracer transport, and vertical dynamics routines; see the FV3 dycore scientific documentation for details.)
DT_RUNSEQ:Time interval of run sequence (coupling interval) between the model components of the UFS Weather Model (in seconds).
envir:The run environment. Set to “test” during the initial testing phase, “para” when running in parallel (on a schedule), and “prod” in production. In operations, this is the operations root directory (aka
$OPSROOT). For more on NCO-compliant directory structure, see the Note on NCO Standards.exp_basedir:The full path to the parent directory of
land-DA_workflow(i.e.,${BASEDIR}in the documentation). The actual value is derived in thesetup_wflow_env.pyfile.EXP_CASE_NAME:A name for the experiment. This variable can be changed to any name the user wants (but note that whitespace and some punctuation characters are not allowed). However, the best names will indicate useful information about the experiment. Each of the sample cases provided sets the experiment name to
app_[forcing_]starttype_##where<app>is the configuration (LND or ATML),<forcing>refers to the atmospheric forcing data used (if any), and<starttype>indicates either a warmstart or coldstart forecast.FCSTHR:Specifies the length of each forecast in hours. Valid values: Integers > 0.
FHROT:Forecast hour at restart in UFS Weather Model (in hours; set in
model_configure).FRAC_GRID:Flag for the fractional grid option in UFS_UTILS and the UFS WM. When the fractional grid option (
frac_grid) was introduced in 2024, some variable names such as snow depth were changed in the WM and UFS_UTILS. However, these variable names were not changed in the Noah-MP land model component. The tile2tile converter uses this flag to switch variable names between JEDI and the land model. When fractional grid is enabled (FRAC_GRID: "YES"), two key variable names do not match between JEDI (sfc_datafiles) and the land model (restart files), and the tile2tile converter must translate between them:Table 3.1 Mismatched Variable Names Variable name in ‘tile2tile_converter’
Description
Noah-MP (restart)
JEDI (
sfc_data)swe
Snow water equivalent
weasdsheleg/weasdlsnow_depth
Snow depth over land
snwdphsnwdph/snodlIn
pre_anal, the title2tile converter creates thesfc_datafiles from the restart files for theanalysistask. Inpost_anal, the title2tile converter creates the restart files for the warmstart forecast from thesfc_dataand restart files for theforecasttask.IC_DATA_MODEL:The name of the model that the initial
sfc_datafiles are coming from in thefcst_ictask. Valid values:"gfs"|"gdas"IMO:Number of horizontal grid points in the X direction. Usually a multiple of the resolution (
${RES}).JEDI_ALGORITHM:Data assimilation algorithm selection. Valid values:
"letkf-oi"|"3dvar"JEDI_IODACONV_PATH:Path to directory where the libraries of the JEDI IODA converter are located.
JEDI_PATH:Path to the directory where JEDI is installed. The actual value is set in a machine-specific portion of
setup_wflow_env.py.JMO:Number of horizontal grid points in the Y direction.
KEEPDATA:Flag to keep data (
"YES") or not ("NO") that is copied to the$DATAROOTdirectory during the forecast experiment.LND_CALC_SNET:Flag indicating whether to calculate the shortwave radiation internally (
".true.") or not (".false.").LND_IC_TYPE:Indicates the source of the initial conditions. Two options are supported: “custom” (i.e.,
C96.initial.tile[1-6].nc) and “sfc” (i.e.,sfc_data.tile[1-6].nc). Valid values:custom|sfc.LND_INITIAL_ALBEDO:Initial mean surface albedo. Valid values: Any number between 0-1.
LND_LAYOUT_X:Number of processes in the x direction per tile for the land model component.
LND_LAYOUT_Y:Number of processes in the y direction per tile for the land model component.
LND_OUTPUT_FREQ_SEC:Output frequency of the land model component (in seconds).
MACHINE:The machine (a.k.a. platform or system) on which the workflow will run. The actual value is provided by the user via the
-p=MACHINEcommand line argument or derived insetup_wflow_env.pyfrom other parameters if possible. Currently supported platforms are listed in Section 1.2.2. Valid values:"ursa"|"hercules"|"orion"|"gaeac6"MED_COUPLING_MODE:CMEPS coupling mode. Valid values:
"ufs.frac"|"ufs.nfrac.aoflux"."ufs.frac"is used with the active FV3 atmospheric component (e.g., in ATML configurations), whereas"ufs.nfrac.aoflux"is used with the data atmosphere component (e.g., LND configurations).model_ver:Version number of package in three digits (e.g., v#.#.#); second level of
comdirectory (see NCO Directory Structure Entities)native_default:Defines raw batch system options/job scheduler commands that Rocoto will use when submitting jobs for a given task (using the
<native>tag). If more than one option is required, they are listed consecutively as a single string. This is a machine-dependent parameter, so default values differ.NET:Model name (first level of
comdirectory structure).NPROCS_ANALYSIS:Number of processors for the
analysistask.NPROCS_FCST_IC:Number of processors for the
fcst_ictask.NPZ:Number of vertical layers in the atmospheric model.
nnodes_forecast:Number of nodes for the
forecasttask.nprocs_forecast:Total number of processes for the
forecasttask. In general, this is set as \(nprocs\_forecast\_lnd + nprocs\_forecast\_atm + (lnd\_layout\_x*lnd\_layout\_y)\).nprocs_forecast_atm:Number of processes for the atmospheric component in the
forecasttask. Actual default value dependent onAPP(LND or ATML).nprocs_forecast_lnd:Number of processes for the land model component (Noah-MP) in the
forecasttask.nprocs_per_node:Number of processes per node for the
forecasttask. Actual default value dependent onnprocs_forecastand the maximum number of cores available per node.OBSDIR:The path to the directory where DA fix files are located. In
scripts/exlandda_prep_data.sh, this value is set to${FIXlandda}/DA_obsunless the user specifies a different path inconfig.yaml.OBS_GHCN_SNOW:Flag to use GHCN snow depth observations. Valid values:
"YES"|"NO".OBS_IMS_SNOW:Flag to use IMS snow depth observations. Valid values:
"YES"|"NO".OBS_SFCSNO:Flag to use SFCSNO snow depth observations. Valid values:
"YES"|"NO".OBS_SMAP:Flag to use SMAP soil moisture observation data. Valid values:
"YES"|"NO".OBS_SMOPS:Flag to use SMOPS soil moisture observation data. Valid values:
"YES"|"NO".OUTPUT_FH:Forecast history file output frequency (when second number is
-1, e.g.,"1 -1") or hours at which to write output history files (e.g.,"6 9 12").partition_default:Default partition; default set based on
MACHINE.PY_LOG_LEVEL:Python logging level. Valid values:
"INFO"|"DEBUG"|"WARN"|"ERROR"|"CRITICAL"queue_default:Default queue; default set based on
MACHINE.RES:Resolution of FV3 grid. Currently, only C96 resolution is supported.
RESTART_INTERVAL:Determines how often the model creates restart files, which are used to continue simulations from a specific point in time. When the second number is
-1, the first number refers to the frequency of restart file output (e.g.,"1 -1"). Otherwise, the list of numbers indicates specific hours at which to output restart files (e.g.,"6 9 12").RUN:Name of model run (third level of
comdirectory structure). In general, same as${NET}.res_p1:Resolution plus 1 (
${RES} + 1) . Must be an integer value.SCHED:The job scheduler to use (e.g., Slurm) on the specified
MACHINE. Valid values:"slurm". Other options may work with a container but have not been tested:"pbspro"|"lsf"|"lsfcray"|"none"SMAP_RAW_WINDOW_SPAN_HALF:The SMAP satellite is designed to create a global map every 2-3 days. Each SMAP data file covers a narrow and long area of 1000 km width, and there can be overlap. To avoid duplication and cover as wide an area as possible, the data files between
${PDY}${cyc} +/- ${SMAP_RAW_WINDOW_SPAN_HALF}hours are combined after the raw data files are converted into the IODA format in theprep_datatask. Its default value is5. This means that 11-hour data sets are combined by default. For example, combined data for2025011800would contain the raw data files from2025011719to2025011805. To use a single data set, set the configuration parameter to0.WARMSTART_DIR:The path to restart files for a warmstart experiment. The actual value set is machine-dependent.
WE2E_TEST:Flag to turn on the workflow end-to-end (WE2E) test. When
WE2E_TEST="YES", the results files from the experiment are compared to the test baseline files, located by default in${BASEDIR}/land-DA_workflow/fix/test_base/we2e_com. If the results are within the tolerance set (viaWE2E_ATOL) at the end of the three main tasks —analysis,forecast, andpost_anal— then the experiment passes. Valid values:"YES"|"NO"WE2E_ATOL:Tolerance of the WE2E test. (Set in
template.land_analysis.yaml.)WE2E_LOG_FN:Name of the WE2E test log file. (Set in
template.land_analysis.yaml.)WRITE_GROUPS:The number of write groups (i.e., groups of MPI tasks) to use.
WRITE_TASKS_PER_GROUP:The number of MPI tasks to allocate for each of the
${WRITE_GROUPS}.
3.1.3.1. NCO Directory Structure Entities
Standard environment variables are defined in the NCEP Central Operations WCOSS Implementation Standards document (pp. 4-5). These variables are used in forming the path to various directories containing input, output, and workflow files. For a visual aid, see the Land DA Directory Structure Diagram.
HOMElandda:(Default:"&exp_basedir;/land-DA_workflow")The location of the land-DA_workflow clone.
PTMP:(Default:"&exp_basedir;/ptmp")Product temporary (PTMP) experiment output space. This directory is used to mimic the operational file structure and contains all of the files and subdirectories used by or generated by the experiment. By default, it is a sibling to the
land-DA_workflowdirectory.COMROOT:(Default:"&PTMP;/&envir;/com")comroot directory, which contains input/output data on current system.DATAROOT:(Default:"&PTMP;/&envir;/tmp")Directory location for the temporary working directories for running jobs. By default, this is a sibling to the
${COMROOT}directory and is located atptmp/<envir>/tmp.LOGDIR:(Default:"&COMROOT;/output/logs")Path to the directory containing log files for each workflow task.
LOGFN_SUFFIX:(Default:"<cyclestr>_@Y@m@d@H.log</cyclestr>")The cycle suffix appended to each task’s log file. It will be rendered in the form
_YYYYMMDDHH.log. For example, theprep_obstask log file for the Jan. 20, 2025 00z cycle would be named:prep_obs_2025012000.log.PDY:(Default:"<cyclestr>@Y@m@d</cyclestr>")Date in YYYYMMDD format.
cyc:(Default:"<cyclestr>@H</cyclestr>")Cycle time in GMT hours, formatted HH.
3.1.3.2. Data Location Entities
DATADEP_LRST1:(Default:"<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>")Land model (Noah-MP) restart files for the next cycle.
DATADEP_LRST2:(Default:"<cyclestr>&WARMSTART_DIR;/ufs_land_restart.@Y-@m-@d_@H-00-00.tile6.nc</cyclestr>")Land model (Noah-MP) restart files used to initialize a warmstart experiment.
DATADEP_COLDSTART:(Default:"<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_skip_coldstart_@Y@m@d@H.txt</cyclestr>")File to skip the cold-start tasks.
DATADEP_DATM1:(Default:"<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>")DATM restart files for the next cycle.
DATADEP_DATM2:(Default:"<cyclestr>&WARMSTART_DIR;/ufs.cpld.datm.r.@Y-@m-@d-00000.nc</cyclestr>")DATM restart files used to initialize a warmstart experiment.
DATADEP_SFC1:(Default:"<cyclestr>&DATAROOT;/DATA_SHARE/RESTART/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>")Surface data (
sfc_data) restart files for the next cycle.DATADEP_SFC2:(Default:"<cyclestr>&WARMSTART_DIR;/@Y@m@d.@H0000.sfc_data.tile6.nc</cyclestr>")Surface data (
sfc_data) files used to initialize a warmstart experiment.DATADEP_FREEFCST:“<cyclestr>&exp_basedir;/exp_case/&EXP_CASE_NAME;/task_analysis_done_@Y@m@d@H.txt</cyclestr>”Data file(s) required to trigger the forecast task in a free-forecast experiment.
3.1.4. Workflow Log
Information related to overall workflow progress is defined in the log: section under workflow:
workflow:
log: "&LOGDIR;/workflow.log"
log:(Default:"&LOGDIR;/workflow.log")Path and name of Rocoto log file(s).
3.1.5. Workflow Tasks
The workflow is divided into discrete tasks, and details of each task are defined within the tasks: section under workflow:.
workflow:
tasks:
task_jcb:
task_prep_data:
task_fcst_ic:
task_pre_anal:
task_analysis:
task_post_anal:
task_forecast:
task_plot_stats:
Each task may contain attributes (attrs:), just as in the overarching workflow: section. Instead of entities, each task contains an envars: section to define environment variables that must be passed to the task when it is executed. Any task dependencies are listed under the dependency: section. Additional details, such as jobname:, walltime:, and queue: may also be set within a specific task.
The following subsections explain any variables that have not already been explained/defined above.
3.1.5.1. Sample Task: Analysis Task (task_analysis)
This section walks users through the structure of the analysis task (task_analysis) to explain how configuration information is provided to the land_analysis.yaml file for each task. Since each task has a similar structure, common information is explained in this section. Variables unique to a particular task are defined in their respective task_* sections based on the structure laid out in template.land_analysis.yaml.
Parameters for a particular task are set in the workflow.tasks.task_<name>: section of the template.land_analysis.yaml file. For example, settings for the analysis task are provided in the task_analysis: section of template.land_analysis.yaml. The following is an excerpt of the task_analysis: section of template.land_analysis.yaml:
workflow:
tasks:
task_analysis:
attrs:
{%- if COLDSTART == "YES" %}
cycledefs: cycled_from_second
{%- else %}
cycledefs: cycled
{%- endif %}
maxtries: 2
envars:
ACCOUNT: "&ACCOUNT;"
BKG_ANAL_EXT_SRC_OPT: "&BKG_ANAL_EXT_SRC_OPT;"
COMINgfs: "&COMINgfs;"
COMROOT: "&COMROOT;"
COUPLER_CALENDAR: "&COUPLER_CALENDAR;"
CUSTOM_JEDI_CONFIG_FLAG: "&CUSTOM_JEDI_CONFIG_FLAG;"
CUSTOM_JEDI_CONFIG_PATH: "&CUSTOM_JEDI_CONFIG_PATH;"
CUSTOM_JEDI_CONFIG_PREFIX: "&CUSTOM_JEDI_CONFIG_PREFIX;"
cyc: "&cyc;"
DATAROOT: "&DATAROOT;"
DATE_CYCLE_FREQ_HR: "&DATE_CYCLE_FREQ_HR;"
DATE_FIRST_CYCLE: "&DATE_FIRST_CYCLE;"
DCOMINera5land: "&DCOMINera5land;"
DO_BKG_ANAL_EXT_SRC: "&DO_BKG_ANAL_EXT_SRC;"
DO_FREE_FORECAST: "&DO_FREE_FORECAST;"
do_jedi_snow: "&do_jedi_snow;"
do_jedi_soil_moisture: "&do_jedi_soil_moisture;"
exp_basedir: "&exp_basedir;"
EXP_CASE_NAME: "&EXP_CASE_NAME;"
FRAC_GRID: "&FRAC_GRID;"
HOMElandda: "&HOMElandda;"
JEDI_ALGORITHM: "&JEDI_ALGORITHM;"
JEDI_PATH: "&JEDI_PATH;"
KEEPDATA: "&KEEPDATA;"
LOGDIR: "&LOGDIR;"
MACHINE: "&MACHINE;"
model_ver: "&model_ver;"
NPROCS_ANALYSIS: "&NPROCS_ANALYSIS;"
NPZ: "&NPZ;"
OBS_GHCN_SNOW: "&OBS_GHCN_SNOW;"
OBS_IMS_SNOW: "&OBS_IMS_SNOW;"
OBS_SFCSNO: "&OBS_SFCSNO;"
OBS_SMAP: "&OBS_SMAP;"
OBS_SMOPS: "&OBS_SMOPS;"
PDY: "&PDY;"
PY_LOG_LEVEL: "&PY_LOG_LEVEL;"
RES: "&RES;"
res_p1: "&res_p1;"
SCHED: "&SCHED;"
WARMSTART_DIR: "&WARMSTART_DIR;"
WE2E_TEST: "&WE2E_TEST;"
WE2E_ATOL: "&WE2E_ATOL;"
WE2E_LOG_FN: "&WE2E_LOG_FN;"
account: "&ACCOUNT;"
command: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"'
jobname: analysis
nodes: "1:ppn=&NPROCS_ANALYSIS;"
{%- if native_default is not none %}
native: "&native_default;"
{%- endif %}
walltime: 00:15:00
partition: "&partition_default;"
queue: "&queue_default;"
join: "&LOGDIR;/analysis&LOGFN_SUFFIX;"
{%- if MACHINE == "ursa" %}
memory: 32G
{%- endif %}
dependency:
and:
taskdep_prep_data:
attrs:
task: prep_data
{%- if CUSTOM_JEDI_CONFIG_FLAG == "NO" %}
taskdep_jcb:
attrs:
task: jcb
{%- endif %}
{%- if APP == "LND" %}
taskdep_pre_anal:
attrs:
task: pre_anal
{%- else %}
or:
datadep_sfc1:
attrs:
age: 5
value: "&DATADEP_SFC1;"
datadep_sfc2:
attrs:
age: 5
value: "&DATADEP_SFC2;"
{%- endif %}
When running the config.LND.era5.3dvar.ims.DA-fcst.warmstart.yaml case on Hercules, the analysis task from land_analysis.yaml file would render as follows:
task_analysis:
attrs:
cycledefs: cycled
maxtries: 2
envars:
ACCOUNT: "&ACCOUNT;"
BKG_ANAL_EXT_SRC_OPT: "&BKG_ANAL_EXT_SRC_OPT;"
COMINgfs: "&COMINgfs;"
COMROOT: "&COMROOT;"
COUPLER_CALENDAR: "&COUPLER_CALENDAR;"
CUSTOM_JEDI_CONFIG_FLAG: "&CUSTOM_JEDI_CONFIG_FLAG;"
CUSTOM_JEDI_CONFIG_PATH: "&CUSTOM_JEDI_CONFIG_PATH;"
CUSTOM_JEDI_CONFIG_PREFIX: "&CUSTOM_JEDI_CONFIG_PREFIX;"
cyc: "&cyc;"
DATAROOT: "&DATAROOT;"
DATE_CYCLE_FREQ_HR: "&DATE_CYCLE_FREQ_HR;"
DATE_FIRST_CYCLE: "&DATE_FIRST_CYCLE;"
DCOMINera5land: "&DCOMINera5land;"
DO_BKG_ANAL_EXT_SRC: "&DO_BKG_ANAL_EXT_SRC;"
DO_FREE_FORECAST: "&DO_FREE_FORECAST;"
do_jedi_snow: "&do_jedi_snow;"
do_jedi_soil_moisture: "&do_jedi_soil_moisture;"
exp_basedir: "&exp_basedir;"
EXP_CASE_NAME: "&EXP_CASE_NAME;"
FRAC_GRID: "&FRAC_GRID;"
HOMElandda: "&HOMElandda;"
JEDI_ALGORITHM: "&JEDI_ALGORITHM;"
JEDI_PATH: "&JEDI_PATH;"
KEEPDATA: "&KEEPDATA;"
LOGDIR: "&LOGDIR;"
MACHINE: "&MACHINE;"
model_ver: "&model_ver;"
NPROCS_ANALYSIS: "&NPROCS_ANALYSIS;"
NPZ: "&NPZ;"
OBS_GHCN_SNOW: "&OBS_GHCN_SNOW;"
OBS_IMS_SNOW: "&OBS_IMS_SNOW;"
OBS_SFCSNO: "&OBS_SFCSNO;"
OBS_SMAP: "&OBS_SMAP;"
OBS_SMOPS: "&OBS_SMOPS;"
PDY: "&PDY;"
PY_LOG_LEVEL: "&PY_LOG_LEVEL;"
RES: "&RES;"
res_p1: "&res_p1;"
SCHED: "&SCHED;"
WARMSTART_DIR: "&WARMSTART_DIR;"
WE2E_TEST: "&WE2E_TEST;"
WE2E_ATOL: "&WE2E_ATOL;"
WE2E_LOG_FN: "&WE2E_LOG_FN;"
account: "&ACCOUNT;"
command: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"'
jobname: analysis
nodes: "1:ppn=&NPROCS_ANALYSIS;"
walltime: 00:15:00
partition: "&partition_default;"
queue: "&queue_default;"
join: "&LOGDIR;/analysis&LOGFN_SUFFIX;"
dependency:
and:
taskdep_prep_data:
attrs:
task: prep_data
taskdep_jcb:
attrs:
task: jcb
taskdep_pre_anal:
attrs:
task: pre_anal
3.1.5.1.1. Task Attributes (attrs:)
The attrs: section for each task includes the cycledefs: attribute and the maxtries: attribute.
cycledefs:(Default: cycled)A comma-separated list of
cycledef:group names. A task with acycledefs:group ID will be run only if its group ID matches one of the workflow’scycledef:group IDs. In this case, thecycledef:attribute is part of a conditional statement. If the user is running a coldstart experiment, thecycledef:group name will becycled_from_secondbecause the model needs time to “spin up” before cycling can begin; otherwise, the group name will becycled.maxtries:(Default: 2)The maximum number of times Rocoto can resumbit a failed task.
3.1.5.1.2. Task Environment Variables (envars)
The envars: section for each task reuses many of the same variables and values defined as entities: for the overall workflow. These values are needed for each task, but setting them individually is error-prone. Instead, a specific workflow task can reference workflow entities using the &VAR; syntax. For example, to set the ACCOUNT: value in task_analysis: to the value of the workflow ACCOUNT: entity, the following statement can be added to the task’s envars: section:
task_analysis:
envars:
ACCOUNT: "&ACCOUNT;"
For most workflow tasks, whatever value is set in the workflow.entities: section should be reused/referenced in other tasks. For example, the MACHINE variable must be defined for each task, and users cannot switch machines mid-workflow. Therefore, users should set the MACHINE variable in the workflow.entities: section and reference that definition in each workflow task. For example:
workflow:
entities:
MACHINE: "hercules"
tasks:
task_jcb:
envars:
MACHINE: "&MACHINE;"
task_prep_data:
envars:
MACHINE: "&MACHINE;"
...
task_forecast:
envars:
MACHINE: "&MACHINE;"
task_plot_stats:
envars:
MACHINE: "&MACHINE;"
3.1.5.1.3. Miscellaneous Task Values
The authoritative Rocoto documentation discusses a number of miscellaneous task attributes in detail. A brief overview is provided in this section.
workflow:
tasks:
task_analysis:
account: "&ACCOUNT;"
command: '&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"'
jobname: analysis
nodes: "1:ppn=&NPROCS_ANALYSIS;"
walltime: 00:15:00
partition: "&partition_default;"
queue: "&queue_default;"
join: "&LOGDIR;/analysis&LOGFN_SUFFIX;"
account:(Default:"&ACCOUNT;")An account where users can charge their compute resources on the specified
MACHINE. This value is typically the same for each task, so the default is to reuse the value set in the Workflow Entities section.Note
The
accountvariable (lowercase) is used by the job scheduler (Slurm), whereas theACCOUNTvariable (uppercase) is anenvarreferenced by the workflow scripts (e.g., scripts, jjobs, ush).command:(Default:'&HOMElandda;/parm/task_load_modules_run_jjob.sh "analysis" "&HOMElandda;" "&MACHINE;"')The command that Rocoto will submit to the batch system to carry out the task’s work.
jobname:Name of the task/job (default will vary based on the task).
nodes:Number of nodes required for the task (default will vary based on the task).
walltime:Time allotted for the task (default will vary based on the task).
partition:(Default:"&partition_default;")The HPC system partition to run on.
queue:(Default:"&queue_default;")The batch system queue or “quality of service” (QOS) that Rocoto will submit the task to for execution.
join:(Default: “&LOGDIR;/analysis&LOGFN_SUFFIX;”)The full path to the task’s log file, which records output from
stdoutandstderr.
Some tasks include a cores: value instead of a nodes: value. For example:
cores:(Default: 1)The number of cores required for the task.
Some tasks include a native: value, usually set to "&native_default;"; whether this value is listed is machine-dependent.
Some tasks include a memory: tag, with a default value of 128G.
3.1.5.1.4. Dependencies
The dependency: section of a task defines what prerequisites (task or data-related) must be met for the task to run. In the case of task_analysis:, it must be run after the jcb and prep_data tasks. Additionally, when running the LND configuration, it must be run after the pre_anal task. Therefore, the dependecy section lists these task dependencies (taskdep_*:). When running in the ATML configuration, the pre_anal task is not required, but one of the two surface data files (datadep_sfc[1/2]) is required as a restart file for the next cycle.
workflow:
tasks:
task_analysis:
dependency:
and:
taskdep_prep_data:
attrs:
task: prep_data
{%- if CUSTOM_JEDI_CONFIG_FLAG == "NO" %}
taskdep_jcb:
attrs:
task: jcb
{%- endif %}
{%- if APP == "LND" %}
taskdep_pre_anal:
attrs:
task: pre_anal
{%- else %}
or:
datadep_sfc1:
attrs:
age: 5
value: "&DATADEP_SFC1;"
datadep_sfc2:
attrs:
age: 5
value: "&DATADEP_SFC2;"
{%- endif %}
For details on dependencies (e.g., attrs:, age:, value: tags), view the authoritative Rocoto documentation.