Container QuickStart in Cades

Review Cades Policies

Container Provider

Cades support containter workloads using Singularity. Refer to the singularity user guide.

Container Availability

Container workloads using singularity can run on the Condo as well as the DGXs. The DGX only supports containers for GPUs, while the Condo support both GPU and non-GPU container workloads.

Building containers

Cades does not provide support for building user containers. This is the user's responsibility. However, we can point to a few ways users can build their own containers.

Container image builds vary greatly in complexity depending on the software the user is trying to containerize. Images meant to run parallel workloads can be very challenging, with nuances depending on how to incorporate the parallel libraries in such workflows.

Building Singularity containers with root access

Container build methods requiring elevated or root access are unavailable to Cades users in Cades. Users can build containers this way on their personal desktops, laptops, or in their cloud VMs before transporting them to Cades for running.

The following are recommended methods for building singularity containers as a non-root user:

Building a container without root access - the Sandbox option

The Sandbox option is building a container image as a directory structure on a filesystem. While this method often suits user's needs, care must be taken with particular complex containers as an issue around file permissions when trying to run a container this way.

The method is illustrated below:

lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$
lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$ singularity build -s 10.1-base-ubuntu16.04-lsf docker://nvidia/cuda:10.1-base-ubuntu16.04
INFO:    Starting build...
Getting image source signatures
Copying blob sha256:be8ec4e48d7f24a9a1c01063e5dfabb092c2c1ec73e125113848553c9b07eb8c
 43.71 MiB / 43.71 MiB [====================================================] 1s
Copying blob sha256:33b8b485aff0509bb0fa67dff6a2aa82e9b7b17e5ef28c1673467ec83edb945d
 849 B / 849 B [============================================================] 0s
Copying blob sha256:d887158cc58cbfc3d03cefd5c0b15175fae66ffbf6f28a56180c51cbb5062b8a
 533 B / 533 B [============================================================] 0s
Copying blob sha256:05895bb28c18264f614acd13e401b3c5594e12d9fe90d7e52929d3e810e11e97
 167 B / 167 B [============================================================] 0s
Copying blob sha256:3d2964768f6061ddd36cf10b5c2580aeaf5f0344adde21b31c623bdb9fbe10e4
 6.53 MiB / 6.53 MiB [======================================================] 0s
Copying blob sha256:50013f7936a6a48c04fc4ada924539c107ac999b64e21ef4c77953a2d6e3e261
 8.08 MiB / 8.08 MiB [======================================================] 0s
Copying blob sha256:dd93bb00d132979bc61fae31a5ce680308d313b81c3222f5abda41f91f5785a6
 186 B / 186 B [============================================================] 0s
Copying config sha256:462851e597ff18c06250eb651e54c4800c6777c060a28a196ed1fda955916cd3
 6.33 KiB / 6.33 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
INFO:    Creating sandbox directory...
INFO:    Build complete: 10.1-base-ubuntu16.04-lsf
lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$ ls -l
total 19
drwxr-xr-x. 22 lsf users 28 Jan 19 16:54 10.1-base-ubuntu16.04-lsf
lsf@or-dgx-login01:~/gapps/temp$

lsf@or-dgx-login01:~/gapps/temp$

Prepare a slurm submission script

#!/bin/bash
#SBATCH -A staff
#SBATCH -p dgx2b
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 1
#SBATCH -J singularity-job
#SBATCH --mem=10G
#SBATCH -t 10:00
#SBATCH --gres=gpu:2
#SBATCH -o "o%u-%N-%j.out"
#SBATCH -e "o%u-%N-%j.err"

srun singularity exec ./10.1-base-ubuntu16.04-lsf nvidia-smi -L

Submit the submission script and view results

lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$ sbatch DGXsingularity.sh
Submitted batch job 24627
lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$ ls -la
total 68
drwxr-xr-x.  3 lsf users   6 Jan 19 18:06 .
drwxr-xr-x. 11 lsf users  13 Jan 19 17:16 ..
drwxr-xr-x. 22 lsf users  28 Jan 19 16:54 10.1-base-ubuntu16.04-lsf
-rw-r--r--.  1 lsf users 287 Jan 19 18:06 DGXsingularity.sh
-rw-r--r--.  1 lsf users   0 Jan 19 18:06 olsf-dgx2-b-24627.err
-rw-r--r--.  1 lsf users 154 Jan 19 18:06 olsf-dgx2-b-24627.out
lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$ cat olsf-dgx2-b-24627.out
GPU 0: Tesla V100-SXM3-32GB (UUID: GPU-906c709b-5974-95b0-0364-50482fdb391a)
GPU 1: Tesla V100-SXM3-32GB (UUID: GPU-cfb0a094-3cda-c5c1-a8be-958312785424)
lsf@or-dgx-login01:~/gapps/cuda:10.1-base-ubuntu16.04$

Build a singularity image file from a singularity definition file as a non-root user using the remote builder.

This method first requires creating an account at Remote Builder and then creating an access token which should be copied and into $HOME/.singularity/remote.yaml. After you have created your account, the upper left hand cover will have a tab option displaying your username with a drop down menu containing the Access Token option. From there you can create an access token.

Note: one major disadvantage of this method is its inability to simply copy software from the buildhost into the container image.

Generating a token to use in the Condo or the DGX might look like below; Note where you have to paste your token:

lsf@or-slurm-login08:~//apps/alpine-deffile$ module load PE-gnu/3.0
lsf@or-slurm-login08:~//apps/alpine-deffile$ module load go
lsf@or-slurm-login08:~//apps/alpine-deffile$ module load singularity/3.6.3
lsf@or-slurm-login08:~//apps/alpine-deffile$ singularity build --remote alpine.sif alpine.def
FATAL:   Unable to submit build job: no authentication token, log in with `singularity remote login`
lsf@or-slurm-login08:~//apps/alpine-deffile$ 
lsf@or-slurm-login08:~//apps/alpine-deffile$ singularity remote login
INFO:    Authenticating with default remote.
Generate an API Key at https://cloud.sylabs.io/auth/tokens, and paste here:
API Key:
INFO:    API Key Verified!
lsf@or-slurm-login08:~//apps/alpine-deffile$

```bash
lsf@dgx2-b:~/gapps/apline-deffile$
lsf@dgx2-b:~/gapps/apline-deffile$
lsf@dgx2-b:~/gapps/apline-deffile$ cat ~/.singularity/remote.yaml
Active: SylabsCloud
Remotes:
  SylabsCloud:
    URI: cloud.sylabs.io
    Token: xxxxxxxxxxxxx
    System: true
    Exclusive: false
lsf@dgx2-b:~/gapps/apline-deffile$

Now authentication at Remote Builder will happen automatically and we can proceed with the image build. Here is a simple image definition file from which we should using the singularity remote builder.

lsf@or-slurm-login08:~/gapps/alpine-deffile$ cat alpine.def
lsf@or-slurm-login08:~/gapps/alpine-deffile$ cat alpine.def
bootstrap: library
from: alpine:3.7

%runscript
echo "hello from the alpine container"
lsf@or-slurm-login08:~/gapps/alpine-deffile$
lsf@dgx2-b:~/gapps/alpine-deffile$
lsf@dgx2-b:~/gapps/alpine-deffile$ singularity build --remote alpine.sif alpine.def
INFO:    Remote "default" added.
INFO:    Authenticating with remote: default
INFO:    API Key Verified!
INFO:    Remote "default" now in use.
INFO:    Starting build...
INFO:    Downloading library image
INFO:    Adding runscript
INFO:    Creating SIF file...
INFO:    Build complete: /tmp/image-478312655
WARNING: Skipping container verifying
 1.98 MiB / 1.98 MiB  100.00% 9.25 MiB/s 0s
INFO:    Build complete: alpine.sif
lsf@dgx2-b:~/gapps/alpine-deffile$
lsf@dgx2-b:~/gapps/alpine-deffile$
lsf@dgx2-b:~/gapps/alpine-deffile$ ls
alpine.def  alpine.sif
lsf@dgx2-b:~/gapps/alpine-deffile$

Running containers in the Condos

The SHPC Condos User Guide highlights several useful examples of running containers in your slurm allocation in the Condos.