CADES OR DGXs
Check eligibility to access the CADES Open Research (or) DGX machines
The DGXs are available for users in UNIX groups corresponding to birthright and CCSD slurm accounts. These UNIX groups are:
- cades-ccsd
- cades-birthright So to check whether you will be able access the DGXs, check that you are a member of at least one of these UNIX groups. Please note that while these two UNIX group authorizes access to the DGXs, they do so with different restrictions.
For example, for a fictional user 'abc' on any cades machine:
* [abc@or-slurm-login07 ~]# groups adc
abc : users cades-ccsd cades-birthright ucams
* [abc@or-slurm-login07 ~]#
Here membership in cades-ccsd
will give user 'abc' access corresponding to the CCSD slurm account.
And, membership in cades-birthright
will give user 'abc' access corresponding to the Birthright slurm account.
Please check you availabilty to access the DGX resources in Open Research.
Access to CADES or DGX
As mentioned earlier, the DGX are open for people with birthright and CCSD accesses to submit slurm jobs.
Login access to these systems is gated by the slurm submit node. Users can access with:
* ssh or-dgx-login01.ornl.gov
or
* ssh or-dgx-login02.cades.ornl.gov
Users may also login directly to the two compute nodes, although this access might be restricted to user who already have running jobs.
* ssh ucamsID@dgx2-a.ornl.gov
or
* ssh ucamsID@dgx2-b.ornl.gov
Both nodes are in separate queues, so you can choose one or the other based on the queue name:
For CCSD users:
* SBATCH -p dgx2a
or
* SBATCH -p dgx2b
The max wall time for the dgx2a and dgx2b queues is unlimited.
For birthright users:
* SBATCH -p dgx2a-birthright
or
* SBATCH -p dgx2b-birthright
The max wall time for the birthright queues is 72 hours and CCSD jobs can preempt birthright jobs.
Software on DGX
Software on DGX is delivered by container. They have a directory called /containers that has several containers built for common software like tensorflow. Other than that, User are expected to build their own software for now.
Storage on DGX
These mount NFS and Lustre in the standard way for CADES OR condo. In addition, there is very fast, local storage called /localscratch/data/
that is meant for fast I/O during job execution.
Please note that storage on /localscratch/data/
should be copied off to Lustre or NFS for long term storage.
So to summarize:
* NFS /home/nfs/* Persistent long after job completion.
* Lustre /lustre/or-scratch/* Persistent long after job completion.
* Fast Local Scratch /localscratch/data/* Self-serve, persistence not guaranteed for extended periods beyond a job's lifecycle.