2.2. Containerized Land DA Workflow

These instructions will help users build and run a basic case for the Unified Forecast System (UFS) Land Data Assimilation (DA) System using a Singularity/Apptainer container. The Land DA container packages together the Land DA System with its dependencies (e.g., spack-stack, JEDI) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth systems models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, MPIs, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported Level 1 system (i.e., Hera, Orion).

This chapter provides instructions for building and running basic Land DA cases for the Unified Forecast System (UFS) Land DA System. Users can choose between two options:

  • A Dec. 21, 2019 00z sample case using ERA5 data with the UFS Land Driver (settings_DA_cycle_era5)

  • A Jan. 3, 2000 00z sample case using GSWP3 data with the UFS Noah-MP land component (settings_DA_cycle_gswp3).

Attention

This chapter of the User’s Guide should only be used for container builds. For non-container builds, see Chapter 2.1, which describes the steps for building and running Land DA on a Level 1 System without a container.

2.2.1. Prerequisites

The containerized version of Land DA requires:

2.2.1.1. Install Singularity/Apptainer

Note

As of November 2021, the Linux-supported version of Singularity has been renamed to Apptainer. Apptainer has maintained compatibility with Singularity, so singularity commands should work with either Singularity or Apptainer (see compatibility details here.)

To build and run Land DA using a Singularity/Apptainer container, first install the software according to the Apptainer Installation Guide. This will include the installation of all dependencies.

Attention

Docker containers can only be run with root privileges, and users generally do not have root privileges on HPCs. However, a Singularity image may be built directly from a Docker image for use on the system.

2.2.2. Build the Container

2.2.2.1. Set Environment Variables

For users working on systems with limited disk space in their /home directory, it is important to set the SINGULARITY_CACHEDIR and SINGULARITY_TMPDIR environment variables to point to a location with adequate disk space. For example:

export SINGULARITY_CACHEDIR=/absolute/path/to/writable/directory/cache
export SINGULARITY_TMPDIR=/absolute/path/to/writable/directory/tmp

where /absolute/path/to/writable/directory/ refers to a writable directory (usually a project or user directory within /lustre, /work, /scratch, or /glade on NOAA RDHPCS systems). If the cache and tmp directories do not exist already, they must be created with a mkdir command.

On NOAA Cloud systems, the sudo su command may also be required:

mkdir /lustre/cache
mkdir /lustre/tmp
sudo su
export SINGULARITY_CACHEDIR=/lustre/cache
export SINGULARITY_TMPDIR=/lustre/tmp
exit

Note

/lustre is a fast but non-persistent file system used on NOAA Cloud systems. To retain work completed in this directory, tar the files and move them to the /contrib directory, which is much slower but persistent.

2.2.2.2. Build the Container

Set a top-level directory location for Land DA work, and navigate to it. For example:

mkdir /path/to/landda
cd /path/to/landda
export LANDDAROOT=`pwd`

where /path/to/landda is the path to this top-level directory (e.g., /Users/Joe.Schmoe/landda).

Hint

If a singularity: command not found error message appears in any of the following steps, try running: module load singularity or (on Derecho) module load apptainer.

2.2.2.2.1. NOAA RDHPCS Systems

On many NOAA RDHPCS systems, a container named ubuntu20.04-intel-landda-release-public-v1.2.0.img has already been built, and users may access the container at the locations in Table 2.3.

Table 2.3 Locations of Pre-Built Containers

Machine

File location

Derecho

/glade/work/epicufsrt/contrib/containers

Gaea

/gpfs/f5/epic/world-shared/containers

Hera

/scratch1/NCEPDEV/nems/role.epic/containers

Jet

/mnt/lfs4/HFIP/hfv3gfs/role.epic/containers

NOAA Cloud

/contrib/EPIC/containers

Orion/Hercules

/work/noaa/epic/role-epic/contrib/containers

Users can simply set an environment variable to point to the container:

export img=path/to/ubuntu20.04-intel-landda-release-public-v1.2.0.img

If users prefer, they may copy the container to their local working directory. For example, on Jet:

cp /mnt/lfs4/HFIP/hfv3gfs/role.epic/containers/ubuntu20.04-intel-landda-release-public-v1.2.0.img .

2.2.2.2.2. Other Systems

On other systems, users can build the Singularity container from a public Docker container image or download the ubuntu20.04-intel-landda-release-public-v1.2.0.img container from the Land DA Data Bucket. Downloading may be faster depending on the download speed on the user’s system. However, the container in the data bucket is the release/v1.2.0 container rather than the updated develop branch container.

To download from the data bucket, users can run:

wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v1.2.0/ubuntu20.04-intel-landda-release-public-v1.2.0.img

To build the container from a Docker image, users can run:

singularity build --force ubuntu20.04-intel-landda-release-public-v1.2.0.img docker://noaaepic/ubuntu20.04-intel-landda:release-public-v1.2.0

This process may take several hours depending on the system.

Note

Some users may need to issue the singularity build command with sudo (i.e., sudo singularity build...). Whether sudo is required is system-dependent. If sudo is required (or desired) for building the container, users should set the SINGULARITY_CACHEDIR and SINGULARITY_TMPDIR environment variables with sudo su, as in the NOAA Cloud example from Section 2.2.2.1 above.

2.2.3. Get Data

In order to run the Land DA System, users will need input data in the form of fix files, model forcing files, restart files, and observations for data assimilation. These files are already present on Level 1 systems (see Section 2.1 for details).

Users on any system may download and untar the data from the Land DA Data Bucket into their $LANDDAROOT directory.

cd $LANDDAROOT
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v1.2.0/Landdav1.2.0_input_data.tar.gz
tar xvfz Landdav1.2.0_input_data.tar.gz

If users choose to add data in a location other than $LANDDAROOT, they can set the input data directory by running:

export LANDDA_INPUTS=/path/to/input/data

where /path/to/input/data is replaced by the absolute path to the location of their Land DA input data.

2.2.4. Run the Container

To run the container, users must:

2.2.4.1. Set Up the Container

Save the location of the container in an environment variable.

export img=path/to/ubuntu20.04-intel-landda-release-public-v1.2.0.img

Set the USE_SINGULARITY environment variable to “yes”.

export USE_SINGULARITY=yes

This variable tells the workflow to use the containerized version of all the executables (including python) when running a cycle.

Users may convert a container .img file to a writable sandbox. This step is optional on most systems:

singularity build --sandbox ubuntu20.04-intel-landda-release-public-v1.2.0 $img

When making a writable sandbox on NOAA RDHPCS systems, the following warnings commonly appear and can be ignored:

INFO:    Starting build...
INFO:    Verifying bootstrap image ubuntu20.04-intel-landda-release-public-v1.2.0.img
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.

From within the $LANDDAROOT directory, copy the land-DA_workflow directory out of the container.

singularity exec -H $PWD $img cp -r /opt/land-DA_workflow .

There should now be a land-DA_workflow directory in the $LANDDAROOT directory. Navigate into the land-DA_workflow directory. If for some reason, this is unsuccessful, users may try a version of the following command instead:

singularity exec -B /<local_base_dir>:/<container_dir> $img cp -r /opt/land-DA_workflow .

where <local_base_dir> and <container_dir> are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another -B /<local_base_dir>:/<container_dir> argument before the container location ($img).

Attention

Be sure to bind the directory that contains the experiment data!

Note

Sometimes binding directories with different names can cause problems. In general, it is recommended that the local base directory and the container directory have the same name. For example, if the host system’s top-level directory is /user1234, the user may want to convert the .img file to a writable sandbox and create a user1234 directory in the sandbox to bind to.

Navigate to the land-DA_workflow directory after it has been successfully copied into $LANDDAROOT.

cd land-DA_workflow

When using a Singularity container, Intel compilers and Intel MPI (preferably 2020 versions or newer) need to be available on the host system to properly launch MPI jobs. The Level 1 systems that have Intel compilers and Intel MPI available are: Hera, Jet, NOAA Cloud, and Orion. Generally, this is accomplished by loading a module with a recent Intel compiler and then loading the corresponding Intel MPI. For example, users can modify the following commands to load their system’s compiler/MPI combination:

module load intel/2022.1.2 impi/2022.1.2

Note

Spack-stack uses lua modules, which require Lmod to be initialized for the module load command to work. If for some reason, Lmod is not initialized, users can source the init/bash file on their system before running the command above. For example, users can modify and run the following command:

source /path/to/init/bash

Then they should be able to load the appropriate modules.

The remaining Level 1 systems that do not have Intel MPI available will need to load a different Intel compiler and MPI combination. Refer to Table 2.4 for which Intel compiler and MPI to load for these systems.

Table 2.4 Intel compilers and MPIs for non-Intel MPI Level 1 systems

Machine

Intel compiler and MPI combinations

Derecho

module load intel-oneapi/2023.2.1 cray-mpich/8.1.25

Gaea

module load intel-classic/2023.1.0 cray-mpich/8.1.25

Hercules

module load intel-oneapi-compilers/2022.2.1 intel-oneapi-mpi/2021.7.1

For Derecho and Gaea, an additional script is needed to help set up the land-DA workflow scripts so that the container can run there.

./setup_container.sh -p=<derecho|gaea>

2.2.4.2. Configure the Experiment

2.2.4.2.1. Modify Machine Settings

Users on a system with a Slurm job scheduler will need to make some minor changes to the submit_cycle.sh file. Open the file and change the account and queue (qos) to match the desired account and qos on the system. Users may also need to add the following line to the script to specify the partition. For example, on Jet, users should set:

#SBATCH --partition=xjet

When using the GSWP3 forcing option, users will need to update line 7 to say #SBATCH --cpus-per-task=4. Users can perform this change manually in a code editor or run:

sed -i 's/--cpus-per-task=1/--cpus-per-task=4/g' submit_cycle.sh

Save and close the file.

2.2.4.2.2. Modify Experiment Settings

The Land DA System uses a script-based workflow that is launched using the do_submit_cycle.sh script. That script requires an input file that details all the specifics of a given experiment. EPIC has provided two sample settings_* files as examples: settings_DA_cycle_era5 and settings_DA_cycle_gswp3.

Attention

Note that the GSWP3 option will only run as-is on Hera and Orion. Users on other systems may need to make significant changes to configuration files, which is not a supported option for the v1.2.0 release. It is recommended that users on these systems use the UFS land driver ERA5 sample experiment set in settings_DA_cycle_era5.

First, update the $BASELINE environment variable in the selected settings_DA_* file to say singularity.internal instead of hera.internal:

export BASELINE=singularity.internal

When using the GSWP3 forcing option, users must also update the MACHINE_ID to orion in settings_DA_cycle_gswp3 if running on Orion.

2.2.4.3. Run the Experiment

To start the experiment, run:

./do_submit_cycle.sh settings_DA_cycle_era5

The do_submit_cycle.sh script will read the settings_DA_cycle_* file and the release.environment file, which contain sensible experiment default values to simplify the process of running the workflow for the first time. Advanced users will wish to modify the parameters in do_submit_cycle.sh to fit their particular needs. After reading the defaults and other variables from the settings files, do_submit_cycle.sh creates a working directory (named workdir by default) and an output directory called landda_expts in the parent directory of land-DA_workflow and then submits a job (submit_cycle.sh) to the queue that will run through the workflow. If all succeeds, users will see log and err files created in land-DA_workflow along with a cycle.log file, which will show where the cycle has ended. The landda_expts directory will also be populated with data in the following directories:

landda_expts/DA_GHCN_test/DA/
landda_expts/DA_GHCN_test/mem000/restarts/vector/
landda_expts/DA_GHCN_test/mem000/restarts/tile/

Depending on the experiment, either the vector or the tile directory will have data, but not both.

Users can check experiment progress/success according to the instructions in Section 2.1.5.2.1, which apply to both containerized and non-containerized versions of the Land DA System.