2.2. Containerized Land DA Workflow
These instructions will help users build and run a basic case for the Unified Forecast System (UFS) Land Data Assimilation (DA) System using a Singularity/Apptainer container. The Land DA container packages together the Land DA System with its dependencies (e.g., spack-stack, JEDI) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth system models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, MPIs, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported Level 1 system (e.g., Hera, Orion).
This chapter provides instructions for building and running the Unified Forecast System (UFS) Land DA System sample case using a container. The sample case runs for Jan. 3-4, 2000 00z and uses GSWP3 data with the UFS Noah-MP land component and data atmosphere (DATM) component.
Attention
This chapter of the User’s Guide should only be used for container builds. For non-container builds, see Chapter 2.1, which describes the steps for building and running Land DA on a Level 1 System without a container.
2.2.1. Prerequisites
The containerized version of Land DA requires:
Installation of Apptainer (or its predecessor, Singularity)
At least 26 CPU cores (may be possible to run with 13, but this has not been tested)
The Slurm job scheduler
2.2.1.1. Install Apptainer
Note
As of November 2021, the Linux-supported version of Singularity has been renamed to Apptainer. Apptainer has maintained compatibility with Singularity, so singularity
commands should work with either Singularity or Apptainer (see compatibility details here.)
To build and run Land DA using a Apptainer container, first install the software according to the Apptainer Installation Guide. This will include the installation of all dependencies.
Attention
Docker containers can only be run with root privileges, and users generally do not have root privileges on HPCs. However, an Apptainer image may be built directly from a Docker image for use on the system.
2.2.2. Build the Container
2.2.2.1. Set Environment Variables
For users working on systems with limited disk space in their /home
directory, it is important to set the SINGULARITY_CACHEDIR
and SINGULARITY_TMPDIR
environment variables to point to a location with adequate disk space. For example:
export SINGULARITY_CACHEDIR=/absolute/path/to/writable/directory/cache
export SINGULARITY_TMPDIR=/absolute/path/to/writable/directory/tmp
where /absolute/path/to/writable/directory/
refers to a writable directory (usually a project or user directory within /lustre
, /work
, /scratch
, or /glade
on NOAA RDHPCS systems). If the cache
and tmp
directories do not exist already, they must be created with a mkdir
command.
On NOAA Cloud systems, the sudo su
command may also be required. For example, users would run:
mkdir /lustre/cache
mkdir /lustre/tmp
sudo su
export SINGULARITY_CACHEDIR=/lustre/cache
export SINGULARITY_TMPDIR=/lustre/tmp
exit
Note
/lustre
is a fast but non-persistent file system used on NOAA Cloud systems. To retain work completed in this directory, tar the files and move them to the /contrib
directory, which is much slower but persistent.
2.2.2.2. Build the Container
Set a top-level directory location for Land DA work, and navigate to it. For example:
mkdir /path/to/landda
cd /path/to/landda
export LANDDAROOT=`pwd`
where /path/to/landda
is the path to this top-level directory (e.g., /Users/Joe.Schmoe/landda
).
Hint
If a singularity: command not found
error message appears in any of the following steps, try running: module load singularity
or (on Derecho) module load apptainer
.
2.2.2.2.1. NOAA RDHPCS Systems
On many NOAA RDHPCS, a container named ubuntu22.04-intel-landda-release-public-v2.0.0.img
has already been built, and users may access the container at the locations in Table 2.4.
Machine |
File location |
---|---|
Gaea |
/gpfs/f5/epic/world-shared/containers |
Hera |
/scratch1/NCEPDEV/nems/role.epic/containers |
Jet |
/mnt/lfs5/HFIP/hfv3gfs/role.epic/containers |
NOAA Cloud |
/contrib/EPIC/containers |
Orion/Hercules |
/work/noaa/epic/role-epic/contrib/containers |
Users can simply set an environment variable to point to the container:
export img=path/to/ubuntu22.04-intel-landda-release-public-v2.0.0.img
If users prefer, they may copy the container to their local working directory. For example, on Jet:
cp /mnt/lfs5/HFIP/hfv3gfs/role.epic/containers/ubuntu22.04-intel-landda-release-public-v2.0.0.img .
2.2.2.2.2. Other Systems
On other systems, users can build the Singularity container from a public Docker container image or download the ubuntu22.04-intel-landda-release-public-v2.0.0.img
container from the Land DA Data Bucket. Downloading may be faster depending on the download speed on the user’s system.
To download from the data bucket, users can run:
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v2.0.0/ubuntu22.04-intel-landda-release-public-v2.0.0.img
To build the container from a Docker image, users can run:
singularity build --force ubuntu22.04-intel-landda-release-public-v2.0.0.img docker://noaaepic/ubuntu22.04-intel21.10-landda:ue160-fms2024.01-release
This process may take several hours depending on the system.
Note
Some users may need to issue the singularity build
command with sudo
(i.e., sudo singularity build...
). Whether sudo
is required is system-dependent. If sudo
is required (or desired) for building the container, users should set the SINGULARITY_CACHEDIR
and SINGULARITY_TMPDIR
environment variables with sudo su
, as in the NOAA Cloud example from Section 2.2.2.1 above.
2.2.3. Get Data
In order to run the Land DA System, users will need input data in the form of fix files, model forcing files, restart files, and observations for data assimilation. These files are already present on Level 1 systems (see Section 2.1 for details).
Users on any system may download and untar the data from the Land DA Data Bucket into their $LANDDAROOT
directory.
cd $LANDDAROOT
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v2.0.0/LandDAInputDatav2.0.0.tar.gz
tar xvfz LandDAInputDatav2.0.0.tar.gz
If users choose to add data in a location other than $LANDDAROOT
, they can set the input data directory by running:
export LANDDA_INPUTS=/path/to/inputs
where /path/to
is replaced by the absolute path to the location of their Land DA input data.
2.2.4. Run the Container
To run the container, users must:
2.2.4.1. Set Up the Container
Save the location of the container in an environment variable.
export img=/path/to/ubuntu22.04-intel-landda-release-public-v2.0.0.img
Users may convert a container .img
file to a writable sandbox. This step is optional on most systems:
singularity build --sandbox ubuntu22.04-intel-landda-release-public-v2.0.0 $img
When making a writable sandbox on NOAA RDHPCS, the following warnings commonly appear and can be ignored:
INFO: Starting build...
INFO: Verifying bootstrap image ubuntu22.04-intel-landda-release-public-v2.0.0.img
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.
From within the $LANDDAROOT
directory, copy the setup_container.sh
script out of the container.
singularity exec -H $PWD $img cp -r /opt/land-DA_workflow/setup_container.sh .
The setup_container.sh
script should now be in the $LANDDAROOT
directory. If for some reason, the previous command was unsuccessful, users may try a version of the following command instead:
singularity exec -B /<local_base_dir>:/<container_dir> $img cp -r /opt/land-DA_workflow/setup_container.sh .
where <local_base_dir>
and <container_dir>
are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another -B /<local_base_dir>:/<container_dir>
argument before the container location ($img
). Note that if previous steps included a sudo
command, sudo
may be required in front of this command.
Note
Sometimes binding directories with different names can cause problems. In general, it is recommended that the local base directory and the container directory have the same name. For example, if the host system’s top-level directory is /user1234
, the user may want to convert the .img
file to a writable sandbox and create a user1234
directory in the sandbox to bind to.
Run the setup_container.sh
script with the proper arguments. Ensure LANDDA_INPUTS
variable is set before running this script.
./setup_container.sh -c=<compiler> -m=<mpi_implementation> -i=$img
where:
-c
is the compiler on the user’s local machine (e.g.,intel/2022.1.2
)
-m
is the MPI on the user’s local machine (e.g.,impi/2022.1.2
)
-i
is the full path to the container image ( e.g.,$LANDDAROOT/ubuntu22.04-intel-landda-release-public-v2.0.0.img
).
When using a Singularity container, Intel compilers and Intel MPI (preferably 2020 versions or newer) need to be available on the host system to properly launch MPI jobs. Generally, this is accomplished by loading a module with a recent Intel compiler and then loading the corresponding Intel MPI.
2.2.4.2. Configure the Experiment
The user should now see the Land-DA_workflow
and jedi-bundle
directories in the $LANDDAROOT
directory.
Because of a conda conflict between the container and the host system, it is best to load rocoto separately instead of using workflow files found in the modulefiles
directory.
module load rocoto
The setup_container.sh
script creates the parm_xml.yaml
from the parm_xml_singularity.yaml
file. Update any relevant variables in this file (e.g., account
or exp_basedir
) before creating the Rocoto XML file.
cd $LANDDAROOT/land-DA_workflow/parm
vi parm_xml.yaml
Save and close the file.
Once everything looks good, run the uwtools scripts to create the Rocoto XML file:
../sorc/conda/envs/land_da/bin/uw template render --input-file templates/template.land_analysis.yaml --values-file parm_xml.yaml --output-file land_analysis.yaml
../sorc/conda/envs/land_da/bin/uw rocoto realize --input-file land_analysis.yaml --output-file land_analysis.xml
A successful run of these commands will output a “0 errors found” message.
2.2.4.3. Run the Experiment
To start the experiment, run:
rocotorun -w land_analysis.xml -d land_analysis.db
Users will need to issue the rocotorun
command multiple times. The tasks must be run in order, and rocotorun
initiates the next task once its dependencies have completed successfully.
See the Workflow Overview section to learn more about the steps in the workflow process.
2.2.4.3.1. Track Progress
To check on the job status, users on a system with a Slurm job scheduler may run:
squeue -u $USER
To view the experiment status, run:
rocotostat -w land_analysis.xml -d land_analysis.db
See the Track Experiment Status section to learn more about the rocotostat
output.
2.2.4.3.2. Check Experiment Output
Since this experiment in the container is the same experiment explained in the previous document section, it is suggested that users view the experiment output structure and plotting results sections to learn more about the expected experiment output.