Overview of SHPC Condos
All researchers in the ORNL Science and Technology Directorates have access to CADES resources at no initial cost. The CADES Service Suite includes four core CADES services: Cloud Computing, Scalable HPC, Dedicated Storage, and High Speed Data Transfer capabilities.
One set of SHPC condos, intended for open publishable research, sits in the ORNL Open protection zone (CADES Open) and another intended for sensitive codes and data, sits in the ORNL Moderate protection zone (CADES Mod). The protection zones contain and control both the software base and the data produced on those systems. Most users will join a condo in CADES open.
All ORNL staff members may have access at no initial cost to 10 nodes of the open SHPC condo, called birthright condo. Many more nodes have been purchased in both Open and Moderate condos by specific ORNL divisions or research groups. Access to those is described below.
Condo | What is it | Who May Join | Cost |
---|---|---|---|
Birthright | Access to 36 nodes, 10 of which have 2 GPUs each, sits in CADES Open Enclave | All ORNL Staff | No initial cost |
CADES Open SHPC research condos | Access to more nodes, the node count depends on the condo, that have been purchased by one of ORNL's research divisions, sits in the CADES Open research protection zone | Researchers doing open research who are collaborating with the division or group that purchased the condo's nodes. (post-docs, students, ORNL staff members, visiting researchers with cyber access PAS) | No initial cost to join, but access is subject to approval by condo owner |
CADES Moderate SHPC condos | Exclusive access to nodes in the Moderate protection zone | Researcher who are working with sensitive codes or data, who are collaborating with a current Moderate condo owner. If you have sensitive codes or data and you are not working with one of the current owners, contact cades-help@ornl.gov | No initial cost to join, but access is granted solely through approval by a current condo owner |
Purchase a new Condo | Access to your own set of resources as defined by you and CADES | Any ORNL research or technical staff member | Please contact cades-help@ornl.gov to begin for purchase information |
If you are not sure what you should join, write cades-help@ornl.gov or simply start with the birthright condo.
SHPC Condos Resources in Brief
- Hardware
- CPUs: Cray
- GPUs: NVIDIA
- Storage
- NFS
- Lustre
- Software
- Slurm scheduler
- Torque/Moab scheduler (being phased out)
- Modules for package management
- Workflow tools
More detail about the specific types of processors and a list of current SHPC Condos can be found here.
SHPC Condo Hardware Configuration
The SHPC Condos are commodity x86_64 clusters that contains a set of MPPs (Massive Parallel Processors). The hardware differs for each of the 12 condo groups, however there are some basic similarities. A processor in this cluster is commonly referred to as a node and has its own CPU, memory, and I/O subsystems. There are 2 CPUs per node with between 32 and 128 cores between them. Nodes with GPUs have either Tesla K80, P100, or V100 GPUs. Each node has between 128 and 512 GB of RAM, and is connected to a condo-wide FDR InfiniBand network.
Node Types
The SHPC Condos have two types of nodes: Login and Compute. While these are similar in terms of hardware, they differ considerably in their intended use.
Node Type | Description |
---|---|
Login | When you connect to either the Moderate or Open SPHC condos, you are placed on a login node. This is the place to write/edit your code, compile small programs, manage data, submit jobs, etc. You should never run large parallel compilations or jobs on the login nodes. Login nodes are shared resources that are in use by many users simultaneously. |
Compute | Most of the SHPC condo nodes are compute nodes. These are where your parallel jobs execute, and where you should compile large programs. They are accessed via the sbatch or qsub command depending on whether your condo is using Slurm or Moab. |
For guidelines on compiling your code on compute nodes, see the software section
SHPC Condo Storage Configuration
Lustre
Lustre is an on-premises, high performance, parallel file system that utilize technologies such as key, value, and set of attributes to compute data in the following environments:
Open Lustre:
- 1.7 PB of temporary computational storage
Your temporary local storage is located at:/lustre/or-scratch/group/username
Replace group
with your group name, and username
with your XCAMS/UCAMS ID.
Moderate Lustre:
- 400 TB of temporary computational storage
Your temporary local storage is located at: /lustre/hydra/group/username
Replace group
with your group name, and username
with your XCAMS/UCAMS ID.
Lustre Best Practices
Lustre is for files that you plan to immediately work on, such as input decks and output. The data should be moved or deleted within days of being generated. Lustre is not for persistent storage of data or software, not for building software or applications (but you may have your executables in Lustre if you have them backed up elsewhere too). If your application generates a lot of small files or log files, plan to compress them if you are not immediately using them, and don’t plan to store them on Lustre for much longer than they are actively used. And Lustre is not for end-storage of any sort.
Lustre is best suited for large files:
-
Files 6MB and larger
-
Optimal size is 8MB+
Please use Lustre as a fast scratch for compute jobs:
-
Tar or move data to home_dirs or project_dirs
-
Avoid writing your logs to lustre
-
Avoid using symlinks on lustre
-
When possible, avoid files smaller than 6MB (8MB+ preferred) to Lustre
Delete or Move files that are no longer used:
- Move to home_dirs or project_dirs on NFS
Avoid building software in Lustre, but it’s fine to run executable in Lustre
Avoid running/building Conda or or Containers on Lustre:
-
Conda, PIP, etc should be installed in your home_dirs
-
Containers should not be stored or run on/from Lustre
Avoid ‘stat’ commands at all costs:
- ls -l , df, du, stat, file, etc
Lustre Purge Policy
Files that have not been used in 90 days are continuously purged from Lustre. Delete files that are no longer used and move important files to nfs.
For well justified cases, CADES will temporarily exempt a luster directory from the purge. The exemption is approved by the CADES RUC.
Use this form to request a purge exemption: https://cades.ornl.gov/special-request-forms/
Lustre is not backed up.
NFS
NFS (Network File System) is a service that allows shared directories and files with others over a network. Home, software, and project directories have been set up on NFS and are permanently available through the network.
Open NFS:
- Each user is automatically given 20 GB of permanent NFS storage. This area is designed for publishable open research data and codes.
Moderate NFS:
- Each user is automatically given 20 GB of permanent NFS storage. This area is for sensitive data and requires access to one of the Moderate SHPC condos.
📝 Note: If your needs differ from what is listed here, fill out the NFS Quota form:https://cades.ornl.gov/special-request-forms or contact us to discuss options.
Your persistent NFS storage location(s):
Who | Location | Access |
---|---|---|
All Users | All Users | This is where you land when you login |
Condo owners and special projects | /nfs/data/project-or-condo-name | cd /nfs/data/condo_name |
Getting Started
Other pages in this guide that help you get started are:
- Getting Started to get credentials for an allocation and join a condo
- Connecting to learn how to login
- Execute a Job to learn how to run an application on a condo (If you're on a system that is using Moab see this page)
- Resource Queues to see what pbs/sbatch options are needed to schedule jobs
SHPC Condo Training
Below are links to tutorials and recordings of SHPC Condo training.
- How to use modules to load the pre-installed software
- How to submit a job using Moab
- CADES Slurm training
- OpenACC Tutorial