Cades Container Policy

Cades provides a platform for container workloads in both the Condo and on specialized multi-GPU hardware, called DGXs. In both instances the container provider recommended by Cades is Singularity. Please refer to the Singularilty user-guide for details on the use of Singularity.

While Cades provides the platform for container workloads, there is limited support for some aspects of container workflows as will be clarified later in this document.

All container workloads are processed through the Slurm workload manager , so users must have an approved project account to run successfully.

Here are some general container policy guidelines:

Containers in DGX cluster

The DGXs are primarily configured to run container workloads requiring GPUs. There is only very limited support for other workflows and workloads not using containers are discouraged. By default singularity in the DGX cluster is configured to use GPUs; this means there is no need to invoke any special option/switch to access an available GPU in a slurm allocation.

As expected, slurm allocations must request GPUs before they are available in user jobs. In addition, users cannot submit jobs from the DGXs to the Condos.

Containers in the Condo cluster

The Condos can support container workloads both requiring GPUs and not. Condo users access singularity by including the appropriate software module environment. And running jobs require an appropriate slurm allocation. Condo users needing GPUs must make that selection in their slurm allocation as also instruct singularity to use the GPU using the -nv switch to singularity. This is demonstrated in the container section here.

Users cannot submit jobs to the DGXs from the Condos.

Building containers

Cades does not provide support for building user containers; this is the user's responsibility. However, we can point to a few ways users can build their own containers. Please see the Container QuickStart Guide for examples on building your own container.

Cades Container Software Support on the DGXs

The DGXs are configured to support a wide range of container workloads only. As such, we refrain from providing version specific software like conda , pip, python and cuda. We expect users to install their own supplement software in their home areas, their projects areas on NFS or on lustre. This is echoed in the DGX Software Policy.

Cades Container Software Support on the Condo

This is the same policy as describe here.