2.2. Containerized Land DA Workflow
These instructions will help users build and run a basic case for the Unified Forecast System (UFS) Land Data Assimilation (DA) System using a Singularity/Apptainer container. The Land DA container packages together the Land DA System with its dependencies (e.g., spack-stack, JEDI) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth systems models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, MPIs, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported Level 1 system (i.e., Hera, Orion).
This chapter provides instructions for building and running basic Land DA cases for the Unified Forecast System (UFS) Land DA System. Users can choose between two options:
Attention
This chapter of the User’s Guide should only be used for container builds. For non-container builds, see Chapter 2.1, which describes the steps for building and running Land DA on a Level 1 System without a container.
2.2.1. Prerequisites
The containerized version of Land DA requires:
At least 6 CPU cores
2.2.1.1. Install Singularity/Apptainer
Note
As of November 2021, the Linux-supported version of Singularity has been renamed to Apptainer. Apptainer has maintained compatibility with Singularity, so singularity
commands should work with either Singularity or Apptainer (see compatibility details here.)
To build and run Land DA using a Singularity/Apptainer container, first install the software according to the Apptainer Installation Guide. This will include the installation of all dependencies.
Attention
Docker containers can only be run with root privileges, and users generally do not have root privileges on HPCs. However, a Singularity image may be built directly from a Docker image for use on the system.
2.2.2. Build the Container
2.2.2.1. Set Environment Variables
For users working on systems with limited disk space in their /home
directory, it is important to set the SINGULARITY_CACHEDIR
and SINGULARITY_TMPDIR
environment variables to point to a location with adequate disk space. For example:
export SINGULARITY_CACHEDIR=/absolute/path/to/writable/directory/cache
export SINGULARITY_TMPDIR=/absolute/path/to/writable/directory/tmp
where /absolute/path/to/writable/directory/
refers to a writable directory (usually a project or user directory within /lustre
, /work
, /scratch
, or /glade
on NOAA RDHPCS systems). If the cache
and tmp
directories do not exist already, they must be created with a mkdir
command.
On NOAA Cloud systems, the sudo su
command may also be required:
mkdir /lustre/cache
mkdir /lustre/tmp
sudo su
export SINGULARITY_CACHEDIR=/lustre/cache
export SINGULARITY_TMPDIR=/lustre/tmp
exit
Note
/lustre
is a fast but non-persistent file system used on NOAA Cloud systems. To retain work completed in this directory, tar the files and move them to the /contrib
directory, which is much slower but persistent.
2.2.2.2. Build the Container
Set a top-level directory location for Land DA work, and navigate to it. For example:
mkdir /path/to/landda
cd /path/to/landda
export LANDDAROOT=`pwd`
where /path/to/landda
is the path to this top-level directory (e.g., /Users/Joe.Schmoe/landda
).
Hint
If a singularity: command not found
error message appears in any of the following steps, try running: module load singularity
or (on Derecho) module load apptainer
.
2.2.2.2.1. NOAA RDHPCS Systems
On many NOAA RDHPCS systems, a container named ubuntu20.04-intel-landda-release-public-v1.2.0.img
has already been built, and users may access the container at the locations in Table 2.3.
Machine |
File location |
---|---|
Derecho |
/glade/work/epicufsrt/contrib/containers |
Gaea |
/gpfs/f5/epic/world-shared/containers |
Hera |
/scratch1/NCEPDEV/nems/role.epic/containers |
Jet |
/mnt/lfs4/HFIP/hfv3gfs/role.epic/containers |
NOAA Cloud |
/contrib/EPIC/containers |
Orion/Hercules |
/work/noaa/epic/role-epic/contrib/containers |
Users can simply set an environment variable to point to the container:
export img=path/to/ubuntu20.04-intel-landda-release-public-v1.2.0.img
If users prefer, they may copy the container to their local working directory. For example, on Jet:
cp /mnt/lfs4/HFIP/hfv3gfs/role.epic/containers/ubuntu20.04-intel-landda-release-public-v1.2.0.img .
2.2.2.2.2. Other Systems
On other systems, users can build the Singularity container from a public Docker container image or download the ubuntu20.04-intel-landda-release-public-v1.2.0.img
container from the Land DA Data Bucket. Downloading may be faster depending on the download speed on the user’s system. However, the container in the data bucket is the release/v1.2.0
container rather than the updated develop
branch container.
To download from the data bucket, users can run:
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v1.2.0/ubuntu20.04-intel-landda-release-public-v1.2.0.img
To build the container from a Docker image, users can run:
singularity build --force ubuntu20.04-intel-landda-release-public-v1.2.0.img docker://noaaepic/ubuntu20.04-intel-landda:release-public-v1.2.0
This process may take several hours depending on the system.
Note
Some users may need to issue the singularity build
command with sudo
(i.e., sudo singularity build...
). Whether sudo
is required is system-dependent. If sudo
is required (or desired) for building the container, users should set the SINGULARITY_CACHEDIR
and SINGULARITY_TMPDIR
environment variables with sudo su
, as in the NOAA Cloud example from Section 2.2.2.1 above.
2.2.3. Get Data
In order to run the Land DA System, users will need input data in the form of fix files, model forcing files, restart files, and observations for data assimilation. These files are already present on Level 1 systems (see Section 2.1 for details).
Users on any system may download and untar the data from the Land DA Data Bucket into their $LANDDAROOT
directory.
cd $LANDDAROOT
wget https://noaa-ufs-land-da-pds.s3.amazonaws.com/current_land_da_release_data/v1.2.0/Landdav1.2.0_input_data.tar.gz
tar xvfz Landdav1.2.0_input_data.tar.gz
If users choose to add data in a location other than $LANDDAROOT
, they can set the input data directory by running:
export LANDDA_INPUTS=/path/to/input/data
where /path/to/input/data
is replaced by the absolute path to the location of their Land DA input data.
2.2.4. Run the Container
To run the container, users must:
2.2.4.1. Set Up the Container
Save the location of the container in an environment variable.
export img=path/to/ubuntu20.04-intel-landda-release-public-v1.2.0.img
Set the USE_SINGULARITY
environment variable to “yes”.
export USE_SINGULARITY=yes
This variable tells the workflow to use the containerized version of all the executables (including python) when running a cycle.
Users may convert a container .img
file to a writable sandbox. This step is optional on most systems:
singularity build --sandbox ubuntu20.04-intel-landda-release-public-v1.2.0 $img
When making a writable sandbox on NOAA RDHPCS systems, the following warnings commonly appear and can be ignored:
INFO: Starting build...
INFO: Verifying bootstrap image ubuntu20.04-intel-landda-release-public-v1.2.0.img
WARNING: integrity: signature not found for object group 1
WARNING: Bootstrap image could not be verified, but build will continue.
From within the $LANDDAROOT
directory, copy the land-DA_workflow
directory out of the container.
singularity exec -H $PWD $img cp -r /opt/land-DA_workflow .
There should now be a land-DA_workflow
directory in the $LANDDAROOT
directory. Navigate into the land-DA_workflow
directory. If for some reason, this is unsuccessful, users may try a version of the following command instead:
singularity exec -B /<local_base_dir>:/<container_dir> $img cp -r /opt/land-DA_workflow .
where <local_base_dir>
and <container_dir>
are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another -B /<local_base_dir>:/<container_dir>
argument before the container location ($img
).
Attention
Be sure to bind the directory that contains the experiment data!
Note
Sometimes binding directories with different names can cause problems. In general, it is recommended that the local base directory and the container directory have the same name. For example, if the host system’s top-level directory is /user1234
, the user may want to convert the .img
file to a writable sandbox and create a user1234
directory in the sandbox to bind to.
Navigate to the land-DA_workflow
directory after it has been successfully copied into $LANDDAROOT
.
cd land-DA_workflow
When using a Singularity container, Intel compilers and Intel MPI (preferably 2020 versions or newer) need to be available on the host system to properly launch MPI jobs. The Level 1 systems that have Intel compilers and Intel MPI available are: Hera, Jet, NOAA Cloud, and Orion. Generally, this is accomplished by loading a module with a recent Intel compiler and then loading the corresponding Intel MPI. For example, users can modify the following commands to load their system’s compiler/MPI combination:
module load intel/2022.1.2 impi/2022.1.2
Note
Spack-stack uses lua modules, which require Lmod to be initialized for the module load
command to work. If for some reason, Lmod is not initialized, users can source the init/bash
file on their system before running the command above. For example, users can modify and run the following command:
source /path/to/init/bash
Then they should be able to load the appropriate modules.
The remaining Level 1 systems that do not have Intel MPI available will need to load a different Intel compiler and MPI combination. Refer to Table 2.4 for which Intel compiler and MPI to load for these systems.
Machine |
Intel compiler and MPI combinations |
---|---|
Derecho |
module load intel-oneapi/2023.2.1 cray-mpich/8.1.25 |
Gaea |
module load intel-classic/2023.1.0 cray-mpich/8.1.25 |
Hercules |
module load intel-oneapi-compilers/2022.2.1 intel-oneapi-mpi/2021.7.1 |
For Derecho and Gaea, an additional script is needed to help set up the land-DA workflow scripts so that the container can run there.
./setup_container.sh -p=<derecho|gaea>
2.2.4.2. Configure the Experiment
2.2.4.2.1. Modify Machine Settings
Users on a system with a Slurm job scheduler will need to make some minor changes to the submit_cycle.sh
file. Open the file and change the account and queue (qos) to match the desired account and qos on the system. Users may also need to add the following line to the script to specify the partition. For example, on Jet, users should set:
#SBATCH --partition=xjet
When using the GSWP3 forcing option, users will need to update line 7 to say #SBATCH --cpus-per-task=4
. Users can perform this change manually in a code editor or run:
sed -i 's/--cpus-per-task=1/--cpus-per-task=4/g' submit_cycle.sh
Save and close the file.
2.2.4.2.2. Modify Experiment Settings
The Land DA System uses a script-based workflow that is launched using the do_submit_cycle.sh
script. That script requires an input file that details all the specifics of a given experiment. EPIC has provided two sample settings_*
files as examples: settings_DA_cycle_era5
and settings_DA_cycle_gswp3
.
Attention
Note that the GSWP3 option will only run as-is on Hera and Orion. Users on other systems may need to make significant changes to configuration files, which is not a supported option for the v1.2.0 release. It is recommended that users on these systems use the UFS land driver ERA5 sample experiment set in settings_DA_cycle_era5
.
First, update the $BASELINE
environment variable in the selected settings_DA_*
file to say singularity.internal
instead of hera.internal
:
export BASELINE=singularity.internal
When using the GSWP3 forcing option, users must also update the MACHINE_ID
to orion
in settings_DA_cycle_gswp3
if running on Orion.
2.2.4.3. Run the Experiment
To start the experiment, run:
./do_submit_cycle.sh settings_DA_cycle_era5
The do_submit_cycle.sh
script will read the settings_DA_cycle_*
file and the release.environment
file, which contain sensible experiment default values to simplify the process of running the workflow for the first time. Advanced users will wish to modify the parameters in do_submit_cycle.sh
to fit their particular needs. After reading the defaults and other variables from the settings files, do_submit_cycle.sh
creates a working directory (named workdir
by default) and an output directory called landda_expts
in the parent directory of land-DA_workflow
and then submits a job (submit_cycle.sh
) to the queue that will run through the workflow. If all succeeds, users will see log
and err
files created in land-DA_workflow
along with a cycle.log
file, which will show where the cycle has ended. The landda_expts
directory will also be populated with data in the following directories:
landda_expts/DA_GHCN_test/DA/
landda_expts/DA_GHCN_test/mem000/restarts/vector/
landda_expts/DA_GHCN_test/mem000/restarts/tile/
Depending on the experiment, either the vector
or the tile
directory will have data, but not both.
Users can check experiment progress/success according to the instructions in Section 2.1.5.2.1, which apply to both containerized and non-containerized versions of the Land DA System.