2.2. Containerized Land DA Workflow

These instructions will help users build and run a basic case for the Unified Forecast System (UFS) Land Data Assimilation (DA) System using a Singularity/Apptainer container. The Land DA container packages together the Land DA System with its dependencies (e.g., spack-stack, JEDI) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth system models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, MPIs, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported Level 1 system (e.g., Hera, Orion).

This chapter provides instructions for building and running the Unified Forecast System (UFS) Land DA System sample case using a container. The sample case runs for Jan. 3-4, 2000 00z and uses GSWP3 data with the UFS Noah-MP land component and data atmosphere (DATM) component.

Attention

This chapter of the User’s Guide should only be used for container builds. For non-container builds, see Chapter 2.1, which describes the steps for building and running Land DA on a Level 1 System without a container.

2.2.1. Prerequisites

The containerized version of Land DA requires:

  • Installation of Apptainer (or its predecessor, Singularity)

  • At least 26 CPU cores (may be possible to run with 13, but this has not been tested)

  • An Intel compiler and MPI (available for free here)

  • The Slurm job scheduler

2.2.1.1. Install Apptainer

Note

As of November 2021, the Linux-supported version of Singularity has been renamed to Apptainer. Apptainer has maintained compatibility with Singularity, so singularity commands should work with either Singularity or Apptainer (see compatibility details here.)

To build and run Land DA using a Apptainer container, first install the software according to the Apptainer Installation Guide. This will include the installation of all dependencies.

Attention

Docker containers can only be run with root privileges, and users generally do not have root privileges on HPCs. However, an Apptainer image may be built directly from a Docker image for use on the system.

2.2.2. Build the Container

2.2.2.1. Set Environment Variables

For users working on systems with limited disk space in their /home directory, it is important to set the SINGULARITY_CACHEDIR and SINGULARITY_TMPDIR environment variables to point to a location with adequate disk space. For example:

export SINGULARITY_CACHEDIR=/absolute/path/to/writable/directory/cache
export SINGULARITY_TMPDIR=/absolute/path/to/writable/directory/tmp

where /absolute/path/to/writable/directory/ refers to a writable directory (usually a project or user directory within /lustre, /work, /scratch, or /glade on NOAA RDHPCS systems). If the cache and tmp directories do not exist already, they must be created with a mkdir command.

On NOAA Cloud systems, the sudo su command may also be required. For example, users would run:

mkdir /lustre/cache
mkdir /lustre/tmp
sudo su
export SINGULARITY_CACHEDIR=/lustre/cache
export SINGULARITY_TMPDIR=/lustre/tmp
exit

Note

/lustre is a fast but non-persistent file system used on NOAA Cloud systems. To retain work completed in this directory, tar the files and move them to the /contrib directory, which is much slower but persistent.

2.2.2.2. Build the Container

Set a top-level directory location for Land DA work, and navigate to it. For example:

mkdir /path/to/landda
cd /path/to/landda
export LANDDAROOT=`pwd`

where /path/to/landda is the path to this top-level directory (e.g., /Users/Joe.Schmoe/landda).

Hint

If a singularity: command not found error message appears in any of the following steps, try running: module load singularity or (on Derecho) module load apptainer.

2.2.2.2.1. NOAA RDHPCS Systems

On many NOAA RDHPCS, a container named ubuntu22.04-intel-landda-release-public-v2.0.0.img has already been built, and users may access the container at the locations in Table 2.4.

Table 2.4 Locations of Pre-Built Containers

Machine

File location

Gaea

/gpfs/f5/epic/world-shared/containers

Hera

/scratch1/NCEPDEV/nems/role.epic/containers

Jet

/mnt/lfs5/HFIP/hfv3gfs/role.epic/containers

NOAA Cloud

/contrib/EPIC/containers

Orion/Hercules

/work/noaa/epic/role-epic/contrib/containers

Users can simply set an environment variable to point to the container:

export img=path/to/ubuntu22.04-intel-landda-release-public-v2.0.0.img

If users prefer, they may copy the container to their local working directory. For example, on Jet:

cp /mnt/lfs5/HFIP/hfv3gfs/role.epic/containers/ubuntu22.04-intel-landda-release-public-v2.0.0.img .

2.2.2.2.2. Other Systems

On other systems, users can build the Singularity container from a public Docker container image or download the ubuntu22.04-intel-landda-release-public-v2.0.0.img container from the Land DA Data Bucket. Downloading may be faster depending on the download speed on the user’s system.

To download from the data bucket, users can run:

wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v2.0.0/ubuntu22.04-intel-landda-release-public-v2.0.0.img

To build the container from a Docker image, users can run:

singularity build --force ubuntu22.04-intel-landda-release-public-v2.0.0.img docker://noaaepic/ubuntu22.04-intel21.10-landda:ue160-fms2024.01-release

This process may take several hours depending on the system.

Note

Some users may need to issue the singularity build command with sudo (i.e., sudo singularity build...). Whether sudo is required is system-dependent. If sudo is required (or desired) for building the container, users should set the SINGULARITY_CACHEDIR and SINGULARITY_TMPDIR environment variables with sudo su, as in the NOAA Cloud example from Section 2.2.2.1 above.

2.2.3. Get Data

In order to run the Land DA System, users will need input data in the form of fix files, model forcing files, restart files, and observations for data assimilation. These files are already present on Level 1 systems (see Section 2.1 for details).

Users on any system may download and untar the data from the Land DA Data Bucket into their $LANDDAROOT directory.

cd $LANDDAROOT
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v2.0.0/LandDAInputDatav2.0.0.tar.gz
tar xvfz LandDAInputDatav2.0.0.tar.gz

If users choose to add data in a location other than $LANDDAROOT, they can set the input data directory by running:

export LANDDA_INPUTS=/path/to/inputs

where /path/to is replaced by the absolute path to the location of their Land DA input data.

2.2.4. Run the Container

To run the container, users must:

2.2.4.1. Set Up the Container

Save the location of the container in an environment variable.

export img=/path/to/ubuntu22.04-intel-landda-release-public-v2.0.0.img

Users may convert a container .img file to a writable sandbox. This step is optional on most systems:

singularity build --sandbox ubuntu22.04-intel-landda-release-public-v2.0.0 $img

When making a writable sandbox on NOAA RDHPCS, the following warnings commonly appear and can be ignored:

INFO:    Starting build...
INFO:    Verifying bootstrap image ubuntu22.04-intel-landda-release-public-v2.0.0.img
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.

From within the $LANDDAROOT directory, copy the setup_container.sh script out of the container.

singularity exec -H $PWD $img cp -r /opt/land-DA_workflow/setup_container.sh .

The setup_container.sh script should now be in the $LANDDAROOT directory. If for some reason, the previous command was unsuccessful, users may try a version of the following command instead:

singularity exec -B /<local_base_dir>:/<container_dir> $img cp -r /opt/land-DA_workflow/setup_container.sh .

where <local_base_dir> and <container_dir> are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another -B /<local_base_dir>:/<container_dir> argument before the container location ($img). Note that if previous steps included a sudo command, sudo may be required in front of this command.

Note

Sometimes binding directories with different names can cause problems. In general, it is recommended that the local base directory and the container directory have the same name. For example, if the host system’s top-level directory is /user1234, the user may want to convert the .img file to a writable sandbox and create a user1234 directory in the sandbox to bind to.

Run the setup_container.sh script with the proper arguments. Ensure LANDDA_INPUTS variable is set before running this script.

./setup_container.sh -c=<compiler> -m=<mpi_implementation> -i=$img

where:

  • -c is the compiler on the user’s local machine (e.g., intel/2022.1.2)

  • -m is the MPI on the user’s local machine (e.g., impi/2022.1.2)

  • -i is the full path to the container image ( e.g., $LANDDAROOT/ubuntu22.04-intel-landda-release-public-v2.0.0.img).

When using a Singularity container, Intel compilers and Intel MPI (preferably 2020 versions or newer) need to be available on the host system to properly launch MPI jobs. Generally, this is accomplished by loading a module with a recent Intel compiler and then loading the corresponding Intel MPI.

2.2.4.2. Configure the Experiment

The user should now see the Land-DA_workflow and jedi-bundle directories in the $LANDDAROOT directory.

Because of a conda conflict between the container and the host system, it is best to load rocoto separately instead of using workflow files found in the modulefiles directory.

module load rocoto

The setup_container.sh script creates the parm_xml.yaml from the parm_xml_singularity.yaml file. Update any relevant variables in this file (e.g., account or exp_basedir) before creating the Rocoto XML file.

cd $LANDDAROOT/land-DA_workflow/parm
vi parm_xml.yaml

Save and close the file.

Once everything looks good, run the uwtools scripts to create the Rocoto XML file:

../sorc/conda/envs/land_da/bin/uw template render --input-file templates/template.land_analysis.yaml --values-file parm_xml.yaml --output-file land_analysis.yaml
../sorc/conda/envs/land_da/bin/uw rocoto realize --input-file land_analysis.yaml --output-file land_analysis.xml

A successful run of these commands will output a “0 errors found” message.

2.2.4.3. Run the Experiment

To start the experiment, run:

rocotorun -w land_analysis.xml -d land_analysis.db

Users will need to issue the rocotorun command multiple times. The tasks must be run in order, and rocotorun initiates the next task once its dependencies have completed successfully.

See the Workflow Overview section to learn more about the steps in the workflow process.

2.2.4.3.1. Track Progress

To check on the job status, users on a system with a Slurm job scheduler may run:

squeue -u $USER

To view the experiment status, run:

rocotostat -w land_analysis.xml -d land_analysis.db

See the Track Experiment Status section to learn more about the rocotostat output.

2.2.4.3.2. Check Experiment Output

Since this experiment in the container is the same experiment explained in the previous document section, it is suggested that users view the experiment output structure and plotting results sections to learn more about the expected experiment output.