Cades Container Policy
Cades provides a platform for container workloads in both the Condo and on specialized multi-GPU hardware, called DGXs. In both instances the container provider recommended by Cades is Singularity. Please refer to the Singularilty user-guide for details on the use of Singularity.
While Cades provides the platform for container workloads, there is limited support for some aspects of container workflows as will be clarified later in this document.
All container workloads are processed through the Slurm workload manager , so users must have an approved project account to run successfully.
Here are some general container policy guidelines:
- Users are responsible for providing and maintaining their own containers.
- Cades admin are not responsible for building user containers.
- Cades admin are not responsible for troubleshooting user containers.
Containers in DGX cluster
The DGXs are primarily configured to run container workloads requiring GPUs. There is only very limited support for other workflows and workloads not using containers are discouraged. By default singularity in the DGX cluster is configured to use GPUs; this means there is no need to invoke any special option/switch to access an available GPU in a slurm allocation.
As expected, slurm allocations must request GPUs before they are available in user jobs. In addition, users cannot submit jobs from the DGXs to the Condos.
Containers in the Condo cluster
The Condos can support container workloads both requiring GPUs and not. Condo users access
singularity by including the appropriate software module environment. And running
jobs require an appropriate slurm allocation.
Condo users needing GPUs must make that selection in their slurm allocation
as also instruct singularity to use the GPU using the -nv
switch to singularity. This is demonstrated
in the container section here.
Users cannot submit jobs to the DGXs from the Condos.
Building containers
Cades does not provide support for building user containers; this is the user's responsibility. However, we can point to a few ways users can build their own containers. Please see the Container QuickStart Guide for examples on building your own container.
Cades Container Software Support on the DGXs
The DGXs are configured to support a wide range of container workloads only. As such, we refrain from providing version specific software like conda , pip, python and cuda. We expect users to install their own supplement software in their home areas, their projects areas on NFS or on lustre. This is echoed in the DGX Software Policy.
Cades Container Software Support on the Condo
This is the same policy as describe here.