Execute a PBS Job on Your SHPC Condo Allocation
The tutorial below shows you how to run Wes Kendall's basic "hello world" program, written in C, using the message passing interface (MPI) to scale across the SHPC Condo compute nodes [1]. This tutorial is intended for users who are new to the SHPC Condo environment and leverages a portable batch system (PBS) script and a C source code.
Additional examples can be found in C++, Fortran or Python sections.
Table of Contents
📝 Note: Do not execute jobs on the login nodes; only use the login nodes to access your compute nodes. Processor-intensive, memory-intensive, or otherwise disruptive processes running on login nodes will be killed without warning.
Step 1: Access Your Allocation
Open and Moderate protection zones each have their own login node. Choose the login node for your protection zone. For this tutorial, we will be using the Open protection zone. If you need to request an allocation, see instructions here.
📝 Note: The Open protection zone can be accessed either using either XCAMS or UCAMS credentials. However, the Moderate protection zone requires an ORNL UCAMS ID.
- Open a Bash terminal (or PuTTY for Windows users).
- Execute
ssh username@or-condo-login.ornl.gov
.- Replace "username" with your XCAMS or UCAMS ID.
- When prompted, enter your XCAMS or UCAMS password.
Once you have connected to the login node, you can proceed to Step 2 and begin assembling your PBS script.
Step 2: Create a PBS Script
Below is the PBS script we are using to run an MPI "hello world" program as a batch job. PBS scripts use variables to specify things like the number of nodes and cores used to execute your job, estimated walltime for your job, and which compute resources to use (e.g., GPU vs. CPU). The sections below feature an example PBS script for SHPC Condo resources, show you how to create and save your own PBS script, and show you how store the PBS script on an SHPC Condo file system.
Consult the official Torque documentation for a complete list of PBS variables.
Example PBS Script
Here is an example PBS script for running a batch job on a SHPC Condo allocation. We break down each command in the section below.
#!/bin/bash
#PBS -N mpi_hello_world_c
#PBS -M your_email@ornl.gov
#PBS -l nodes=1:ppn=16
#PBS -l walltime=0:00:6:0
#PBS -W group_list=cades-arm
#PBS -A arm
#PBS -l qos=std
#PBS -q batch
#PBS -V
module purge
module load PE-gnu
module list
cd $PBS_O_WORKDIR
pwd
mpirun hello_world_c
PBS Script Breakdown
Here, we break down the essential elements of the above PBS script.
#!/bin/bash
: sets the script type#PBS -N mpi_hello_world_c
: sets the job name; your output files will share this name#PBS -M your_email@ornl.gov
: add your email address if you would like errors to be emailed to you#PBS -l nodes=1:ppn=16
: sets the number of nodes and processors per node that you want to use to run your job; in this case, we're using one node and 16 cores per node.#PBS -l walltime=0:00:6:0
: tells PBS the anticipated runtime for your job, wherewalltime=HH:MM:S
#PBS -W group_list=cades-arm
: specifies your LDAP group#PBS -A arm
: specifies your account type;#PBS -l qos=std
: sets the quality of service (QOS); options:std
: Normal jobs that can run for up to 48 hourslong
: Long running jobs that can run for up to 14 days, but at a lower priority to start (your group may not have this option available)
#PBS -q batch
: specifies the resource queue the job should be submitted to; see the resource queues page for more options.module purge
: clears any modules currently loaded that might result in a conflictmodule load PE-gnu
: loads the PE-gnu module, which loads OpenMPI, GCC, and XALTmodule list
: confirms the modules that were loaded.cd $PBS_O_WORKDIR
: sets the working path- In this example, our binary will be launched from the same directory as our PBS script. The results from the binary will also be placed here.
pwd
: confirms current working directorympirun hello_world_c
: calls MPI to run ourhello_world_c
binary
PBS Procedure
Now that we have covered the basics of a PBS script in the context of an SHPC Condo, we will now talk about actually creating and using the script on your allocation.
When creating and editing your PBS script, we will be working on the login node (from Lustre storage) using the text editor, Vi. If Lustre storage is not available, you may complete this tutorial from within your home directory on NFS.
- From the login node, change your working directory to the desired file system. We are going to use our Lustre allocation for this example.
cd /lustre/or-scratch/cades-arm/username
Replace "username" with your own UCAMS/XCAMS user ID.
- Use Vi to create and edit your PBS script.
vi hello_world_c.pbs
- Write your PBS script within Vi or paste the contents of your PBS script into Vi.
- Hit
Esc
on your keyboard to exit the input mode. - Enter
:set paste
into Vi's command line, and pressReturn
to enter paste mode. - Paste the PBS code into Vi.
- Hit
- When finished, hit
Esc
on your keyboard to exit the input mode. - Enter
:x!
into Vi's command line, and pressReturn
to save your file and return to the Bash shell.
With the PBS script in place, you can now move on to compiling your hello world C code in Step 3.
Step 3: Compile the C Program from Source
Below is Wes Kendall's simple "hello world" C program that utilizes MPI to run the job in parallel [1]. We will need to compile this source code on one of the compute nodes.
MPI Hello World Source Code
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
// Initialize the MPI environment.
MPI_Init(NULL, NULL);
// Get the number of processes.
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process.
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor.
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message.
printf("Hello world from processor %s, rank %d"
" out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
}
C Procedure
When creating and editing your hello_world.c
source code, we will be working on the login node (from Lustre storage) using the text editor, Vi. If Lustre storage is not available, you may complete this tutorial from within your home directory on NFS.
- Ensure that you are still in your working directory (
/lustre/or-scratch/cades-arm/username
) usingpwd
. - Use Vi (
vi
) to create your C source file within your working directory.
vi hello-world.c
- Paste the hello world C code into Vi.
- Hit
Esc
on your keyboard to exit the input mode. - Enter
:set paste
into Vi's command line, and pressReturn
to enter paste mode. - Paste the C code into Vi.
- Hit
- When finished, hit
Esc
on your keyboard to exit the paste/input mode. - Enter
:x!
into Vi's command line, and pressReturn
to save your file and return to the Bash shell.
You now have a C source file that you can compile. - Load the MPI compiler using the PE-gnu module.
module load PE-gnu
- Compile the C source into a binary executable file.
mpicc -o hello_world_c hello_world.c
- Use
ls -al
to verify the presence of thehello_world_c
binary in your working directory.
With the C code compiled into a binary (hello_world_c
), we can now schedule and run the job on our compute nodes.
Step 4: Run the Job
- Before proceeding, ensure that you are still in your working directory (using
pwd
) and that you still have the PE-gnu module loaded (usingmodule list
).- We need to be in the same path/directory as our PBS script and our C binary. Use
ls -al
to confirm their presence. - PE-gnu also loads OpenMPI, GCC, and XALT. Use
module list
to confirm their presence. If necessary, usemodule load PE-gnu
to reload the module(s).
- We need to be in the same path/directory as our PBS script and our C binary. Use
- Use
qsub
to schedule your batch job in the queue.
qsub hello_world_c.pbs
This command will automatically queue your job using Torque and produce a six-digit job number (shown below).
143295.or-condo-pbs01
You can check the status of your job at any time with the checkjob
command.
checkjob 143295
You can also stop your job at any time with the qdel
command.
qdel 143295
- View your results.
Once your job completes, Torque will produce two output/data files. These output/data files, unless otherwise specified in the PBS script, are placed in the same path as your binary.
One file (myscript.o<jobnumber>
) contains the results of the binary you just executed, and the other (myscript.e<jobnumber>
) contains any errors that occurred during execution.
Replace "myscript" with the name of your script and "<jobnumber>" with your job number.
You can view the contents of these files using themore
command followed by the file name.
more mpi_hello_world_c.o143295
Your output should look something like this, with one line per processor core (16 in this case):
Hello world from processor or-condo-c136.ornl.gov, rank 3 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 4 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 6 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 11 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 7 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 14 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 2 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 5 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 8 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 9 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 10 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 12 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 13 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 15 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 0 out of 16 processors
Hello world from processor or-condo-c136.ornl.gov, rank 1 out of 16 processors
- Download your results (using the
scp
command or an SFTP client) or move them to persistent storage. See our moving data section for help.
Run an Interactive Job
You may want to run on a compute node interactivity, for example, if you are debugging code.
To do this, use qsub with the -I option and pass in the scheduling arguments as shown in the example below.
qsub -I -W group_list=cades-arm -A arm -l qos=std,walltime=24:00:00 -q batch -l nodes=1:ppn=32
When the job starts, you will be put on a compute node and you can run your executable interactively.
Works Cited
- Wes Kendall, "MPI Hello World," MPI Tutorial, accessed June 14, 2017, http://mpitutorial.com/tutorials/mpi-hello-world/.