.. _Container:
**********************************
Containerized Land DA Workflow
**********************************
These instructions will help users run a basic case for the Unified Forecast System (:term:`UFS`) Land Data Assimilation (DA) System using a `Singularity/Apptainer `_ container. The Land DA :term:`container` packages together the Land DA System with its dependencies (e.g., :term:`spack-stack`, :term:`JEDI`), prebuilt Land-DA binaries, and provides a uniform environment in which to run the Land DA System. Normally, the details of running Earth system models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, :term:`MPIs `, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience running Land DA. This approach is recommended for users not running Land DA on a supported :ref:`Level 1 ` system (e.g., Ursa, Hercules).
This chapter provides instructions for running the Unified Forecast System (:term:`UFS`) Land DA System in a container using a Jan. 19-20, 2025 00z sample case. This case is a :term:`LND` :term:`warmstart` configuration that uses :term:`ERA5` atmospheric forcing data, :term:`IMS` snow depth observation data, and the 3D-Var DA algorithm.
.. include:: ../doc-snippets/gcblizzard-desc.rst
.. attention::
This chapter of the User's Guide should **only** be used for container builds. For non-container builds, see :numref:`Chapter %s `, which describes the steps for building and running Land DA on a :ref:`Level 1 System ` **without** a container.
.. _Prereqs:
Prerequisites
**************
The containerized version of Land DA requires:
* `Installation of Apptainer `_ (or its predecessor, Singularity)
* At least 26 CPU cores (may be possible to run with 13, but this has not been tested)
* An **Intel** compiler and :term:`MPI` (available for `free here `_)
* The `Rocoto workflow manager `_
* The `Slurm `_ job scheduler
.. note::
As of November 2021, the Linux-supported version of Singularity has been `renamed `_ to *Apptainer*. Apptainer has maintained compatibility with Singularity, so ``singularity`` commands should work with either Singularity or Apptainer (see `compatibility details here `_.)
.. _create-dir-c:
Create a Working Directory
*****************************
.. include:: ../doc-snippets/create-work-dir.rst
.. _GetDataC:
Get Data
***********
In order to run the Land DA System, users will need input data in the form of fix files, model forcing files, restart files, and snow depth observations for data assimilation. These files are already present on Level 1 systems (see :numref:`Section %s ` for details).
Users on any system may download and untar the data from the `Land DA Data Bucket `_ into their ``${BASEDIR}`` directory. In the working directory, run:
.. code-block:: console
cd ${BASEDIR}
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v3.0.0/LandDAInputDatav3.0.0.tar.gz
tar xvfz LandDAInputDatav3.0.0.tar.gz
.. _DownloadContainer:
Download or Build the Container
*********************************
Users can download the ``ubuntu22.04-intel-landda-release-public-v3.0.0.img`` container from the `Land DA Data Bucket `_ or build the Singularity container from a public Docker :term:`container` image. Downloading may be faster depending on the download speed on the user's system.
Download the Container
========================
To download from the data bucket, users can run:
.. code-block:: console
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v3.0.0/ubuntu22.04-intel-landda-release-public-v3.0.0.img
This will download a container image named ``ubuntu22.04-intel-landda-release-public-v3.0.0.img``. Users may continue to :ref:`set up the container `.
.. _BuildC:
Build the Container
=====================
Alternatively, users can build the container from a Docker image. (Users who have already downloaded the container may skip to the :ref:`next section `.) Users working on systems with limited disk space in their ``/home`` directory will need to set the ``SINGULARITY_CACHEDIR`` and ``SINGULARITY_TMPDIR`` environment variables to point to a location with adequate disk space. For example:
.. code-block::
export SINGULARITY_CACHEDIR=/absolute/path/to/writable/directory/cache
export SINGULARITY_TMPDIR=/absolute/path/to/writable/directory/tmp
See detailed instructions for this in :numref:`Section %s `. Then, run:
.. code-block:: console
singularity build --force ubuntu22.04-intel-landda-release-public-v3.0.0.img docker://noaaepic/ubuntu22.04-intel2024.2.0-1-devel-landda:ue192-v3.0.0
This process may take several hours depending on the system.
.. note::
Some users may need to issue the ``singularity build`` command with ``sudo`` (i.e., ``sudo singularity build...``). Whether ``sudo`` is required is system-dependent. If ``sudo`` is required (or desired) for building the container, users should set the ``SINGULARITY_CACHEDIR`` and ``SINGULARITY_TMPDIR`` environment variables with ``sudo su``, as in the NOAA Cloud example from :numref:`Section %s ` below.
.. _SetUpContainer:
Set Up the Container
*********************
.. attention::
It is recommended that users establish different working directories for :term:`LND` and :term:`ATML` experiments because these experiments use different executables. This makes it impossible to run LND and ATML experiment configurations simultaneously from the same working directory. Users can circumvent this issue by creating an ``lnd`` directory for LND experiments and an ``atml`` directory for ATML experiments. Then, perform the container setup instructions in each directory.
Create experiment variables that point to the container image (``$img``) and, if necessary, the location of the data (``$LANDDA_INPUTS``). Users only need to set the location of the data if they added it in a location other than ``${BASEDIR}``:
.. code-block:: console
# Set path to container
export img=/path/to/ubuntu22.04-intel-landda-release-public-v3.0.0.img
# Set path to data (if necessary)
export LANDDA_INPUTS=/path/to/inputs
where ``/path/to`` is replaced by the absolute path to the location of the container and Land DA input data.
Within the ``${BASEDIR}`` directory, copy the ``setup_container.sh`` script out of the container.
.. code-block:: console
singularity exec -H $PWD $img cp -r /opt/land-DA_workflow/setup_container.sh .
The ``setup_container.sh`` script should now be in the ``${BASEDIR}`` directory. Note that if previous steps included a ``sudo`` command, ``sudo`` may be required in front of this command for it to work. If for some reason, the previous command was unsuccessful, users may try a version of the following command instead:
.. code-block:: console
singularity exec -B /:/ $img cp -r /opt/land-DA_workflow/setup_container.sh .
where ```` and ```` are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another ``-B /:/`` argument before the container location (``$img``).
.. note::
Users may convert a container ``.img`` file to a writable sandbox. This step is optional on most systems but allows users to make changes to the container if desired:
.. code-block:: console
singularity build --sandbox ubuntu22.04-intel-landda-release-public-v3.0.0 $img
Sometimes binding directories with different names can cause problems. In general, it is recommended that the local base directory and the container directory have the same name. For example, if the host system's top-level directory is ``/user1234``, the user may want to convert the ``.img`` file to a writable sandbox and create a ``user1234`` directory in the sandbox to bind to.
Next, run the ``setup_container.sh`` script with the proper arguments.
.. code-block:: console
./setup_container.sh -c= -m= -i=$img
where:
* ``-c`` is the compiler on the user's local machine ( e.g., ``intel/2024.2.1``, ``intel-oneapi-compilers/2024.1.0``, ``intel-oneapi-compilers/2024.2.1``)
* ``-m`` is the :term:`MPI` on the user's local machine ( e.g., ``impi/2024.2.1``, ``intel-oneapi-mpi/2021.12.0``, ``intel-oneapi-mpi/2021.13.1``)
* ``-i`` is the full path to the container image ( e.g., ``${BASEDIR}/ubuntu22.04-intel-landda-release-public-v3.0.0.img``)
Concretely, users would run something like:
.. code-block:: console
./setup_container.sh -c=intel/2024.2.1 -m=impi/2024.2.1 -i=$img
Running this script will print the following messages to the console:
.. code-block:: console
Copying out land-DA_workflow from container
/usr/bin/cp: cannot open '/opt/land-DA_workflow/sorc/conda/pkgs/pyshp-3.0.3-pyhd8ed1ab_0/info/test/build_env_setup.sh' for reading: Permission denied
/usr/bin/cp: cannot open '/opt/land-DA_workflow/sorc/conda/pkgs/pyshp-3.0.3-pyhd8ed1ab_0/info/test/conda_build.sh' for reading: Permission denied
Checking if LANDDA_INPUTS variable exists and linking to land-DA_workflow
Land DA data exists, creating links
Updating scripts files
Updating singularity modulefiles
Updating run related scripts
Setup conda
Getting the jedi test data from container
Creating links for exe
Done
The user should now see the ``land-DA_workflow`` and ``jedi-bundle`` directories in the ``${BASEDIR}`` directory.
Containers come with pre-built executables, so users may continue to the next section to configure the experiment.
.. _ConfigureExptC:
Configure the Experiment
===========================
To configure an experiment, first load the workflow modulefiles for the container:
.. code-block:: console
cd land-DA_workflow
module use modulefiles
module load wflow_singularity
Then navigate to the ``parm`` directory and copy the desired case (e.g., ``config.LND.era5.3dvar.ims.DA-fcst.warmstart.yaml``) into ``config.yaml``:
.. code-block:: console
cd parm
cp config_samples/config..yaml config.yaml
where ```` is the name of one of the sample case files in the `config_samples `_ directory.
For example, when running the ``LND.era5.3dvar.ims.DA-fcst.warmstart`` case, run:
.. code-block:: console
cd parm
cp config_samples/config.LND.era5.3dvar.ims.DA-fcst.warmstart.yaml config.yaml
Users may configure elements of the experiment in ``config.yaml`` if desired. For example, users may wish to alter ``DATE_FIRST_CYCLE``, ``DATE_LAST_CYCLE``, and/or ``DATE_CYCLE_FREQ_HR`` to indicate a different start cycle, end cycle, and increment. Users may also wish to change the DA algorithm from ``3dvar`` to ``letkf`` via the ``JEDI_ALGORITHM`` variable. Users who wish to run a more complex experiment may change the values in ``config.yaml`` using information from Sections :numref:`%s: Workflow Configuration Parameters `, :numref:`%s: I/O for the Land DA System `, and :numref:`%s: JEDI DA System `.
.. attention::
When regenerating an experiment from the same or similar ``config.yaml`` file, if the ``EXP_CASE_NAME`` remains the same, the old experiment directory with that name will be renamed with the ``*_old`` suffix, and the new experiment directory will use ``EXP_CASE_NAME``. However, the ``envir`` directory will **NOT** be regenerated unless the ``envir`` parameter is given a new name. If it keeps the same name, the previous ``ptmp/`` directory and everything in it will remain (rather than being renamed), and the experiment will continue from where it left off using the files from the previous directory. This can be helpful in certain cases but detrimental in others, so users need to make a conscious choice based on their use case.
Generate the experiment directory by running:
.. code-block:: console
./setup_wflow_env.py -p=singularity
If the command runs without issue, this script will print override messages, experiment details, and "Schema validation succeeded for Rocoto config/XML" messages to the console, similar to the following excerpts:
.. code-block:: console
ubuntu@ip-10-29-93-226:~/land-DA_workflow/parm$ ./setup_wflow_env.py -p=singularity
Python Log Level= str: INFO, attr: 20
INFO::/contrib/${USER}/landda/land-DA_workflow/parm/./setup_wflow_env.py::L34:: Current directory (PARMdir): /contrib/Gillian.Petro/landda/land-DA_workflow/parm
INFO::/contrib/${USER}/landda/land-DA_workflow/parm/./setup_wflow_env.py::L36:: Home directory (HOMEdir): /contrib/Gillian.Petro/landda/land-DA_workflow
INFO::/contrib/${USER}/landda/land-DA_workflow/parm/./setup_wflow_env.py::L38:: Experimental base directory (exp_basedir): /contrib/Gillian.Petro/landda
INFO::/contrib/${USER}/landda/land-DA_workflow/parm/./setup_wflow_env.py::L168:: Experimental case directory /contrib/Gillian.Petro/landda/exp_case/lnd_era5_warmstart_00 has been created.
INFO::/contrib/${USER}/landda/land-DA_workflow/parm/./setup_wflow_env.py::L175:: Rocoto YAML template: /contrib/Gillian.Petro/landda/land-DA_workflow/parm/templates/template.land_analysis.yaml
**************************************************
Overriding ACCOUNT = epic
Overriding APP = LND
Overriding ATMOS_FORC = era5
...
Overriding queue_default = batch
Overriding res_p1 = 97
**************************************************
DATE_FIRST_CYCLE: 2025011900
nprocs_forecast: 26
LND_INITIAL_ALBEDO: 0.25
WRITE_GROUPS: 1
JEDI_PATH: /contrib/${USER}/landda
MED_COUPLING_MODE: ufs.nfrac.aoflux
COUPLER_CALENDAR: 2
WE2E_TEST: NO
nnodes_forecast: 1
CCPP_SUITE: FV3_GFS_v17_p8_ugwpv1
MACHINE: singularity
OBS_IMS_SNOW: YES
...
NPROCS_ANALYSIS: 6
FHROT: 0
envir: test_lnd_era5_warm
WRITE_TASKS_PER_GROUP: 6
OUTPUT_FH: 1 -1
COMINgdas:
COLDSTART: NO
INFO::/contrib/${USER}/landda/land-DA_workflow/sorc/conda/envs/land_da/lib/python3.12/site-packages/uwtools/config/validator.py::L81::Schema validation succeeded for Rocoto config
INFO::/contrib/${USER}/landda/land-DA_workflow/sorc/conda/envs/land_da/lib/python3.12/site-packages/uwtools/rocoto.py::L81::Schema validation succeeded for Rocoto XML
.. _RunExptC:
Run the Experiment
********************
To run the experiment, users may submit tasks manually via ``rocotorun`` or use the ``automate_launch_script.py`` script to automate the task submission.
.. _WflowOverviewC:
Workflow Overview
==================
.. include:: ../doc-snippets/wflow-task-table.rst
.. _automated-run-c:
Automated Run
==================
To automate task submission using ``automate_launch_script.py``, simply run the script:
.. code-block:: console
./automate_launch_script.py
The console will output progress messages every 10 seconds by default:
.. code-block:: console
Running ./launch_rocoto_wflow.sh ...
Cycles: 0 out of 2 completed.
Detected wflow_status = IN PROGRESS
Waiting 10 seconds before next run ...
...
Running ./launch_rocoto_wflow.sh ...
Cycles: 1 out of 2 completed.
Detected wflow_status = IN PROGRESS
Waiting 10 seconds before next run ...
Running ./launch_rocoto_wflow.sh ...
Cycles: 2 out of 2 completed.
Detected wflow_status = SUCCESS
!!! ===== Workflow completed successfully. Stopping ===== !!!
Users can change how often the script relaunches by adding the ``-i`` argument. For example, to run the workflow launch script every 15 seconds, users would run:
.. code-block:: console
./automate_launch_script.py -i=15
To check the status of the experiment, see :numref:`Section %s ` on tracking experiment progress.
.. _manual-run-c:
Manual Submission
==================
Depending on the user's platform, it may be necessary to load Rocoto:
.. code-block::
module load rocoto/1.3.7
.. include:: ../doc-snippets/manual-run.rst
See the :ref:`Workflow Overview ` section to learn more about the steps in the workflow process.
.. _TrackProgressC:
Track Progress
================
.. include:: ../doc-snippets/track-progress.rst
.. _check-output-c:
Check Experiment Output
=========================
.. include:: ../doc-snippets/check-output.rst
.. _plotting-c:
Plotting Results
------------------
.. include:: ../doc-snippets/plotting.rst
Appendix
**********
.. _CloudHPC:
Working in the Cloud or on HPC Systems
=========================================
Users working on systems with limited disk space in their ``/home`` directory may need to set the ``SINGULARITY_CACHEDIR`` and ``SINGULARITY_TMPDIR`` environment variables to point to a location with adequate disk space. For example:
.. code-block::
export SINGULARITY_CACHEDIR=/absolute/path/to/writable/directory/cache
export SINGULARITY_TMPDIR=/absolute/path/to/writable/directory/tmp
where ``/absolute/path/to/writable/directory/`` refers to the absolute path to a writable directory with sufficient disk space. If the ``cache`` and ``tmp`` directories do not exist already, they must be created with a ``mkdir`` command.
On NOAA Cloud systems, the ``sudo su``/``exit`` commands may also be required; users on other systems may be able to omit these. For example:
.. code-block::
mkdir /lustre/cache
mkdir /lustre/tmp
sudo su
export SINGULARITY_CACHEDIR=/lustre/cache
export SINGULARITY_TMPDIR=/lustre/tmp
exit
.. note::
``/lustre`` is a fast but non-persistent file system used on NOAA Cloud systems. To retain work completed in this directory, `tar the files `_ and move them to the ``/contrib`` directory, which is much slower but persistent.
After setting the ``SINGULARITY_CACHEDIR`` and ``SINGULARITY_TMPDIR`` environment variables, users may continue to :ref:`build the container `.