High Performance & Scientific Computing

SLURM vs. PBS on ISAAC-NG

The ISAAC Legacy (formerly ACF) and ISAAC-NG computing clusters now both utilize SLURM for workload management to manage and schedule jobs submitted to the clusters.

A variety of example SLURM jobs are available on ISAAC Legacy at /lustre/haven/examples/jobs and on ISAAC NG at /lustre/isaac/examples/jobs.

Command Differences

Command Description	Torque	SLURM
Batch job submission	qsub <Job File Name>	sbatch <Job File Name>
Interactive job submission	qsub -l	salloc \| srun –pty /bin/bash
Job list	qstat	squeue -l
Job list by users	qstat -u <User Name>	squeue -l -u <User Name>
Job deletion	qdel <Job ID>	scancel <Job ID>
Job hold	qhold <Job ID>	scontrol hold <Job ID>
Job release	qrls <Job ID>	scontrol release <Job ID>
Job update	qalter <Job ID>	scontrol update job <Job ID>
Job details	qstat -f <Job ID>	scontrol show job <Job ID>
Node list	pbsnodes -l	sinfo -N
Node details	pbsnodes	scontrol show nodes

Table 1.1: System commands for Torque and SLURM

Command Description	Moab	SLURM
Job start time	showstart <Job ID>	squeue –start -j <Job ID>
Status of nodes	mdiag -n	sinfo -N -l
User’s account	mdiag -u <User Name>	sacctmgr show association user=<User Name>
Account members	mdiag -a <Account Name>	sacctmgr show assoc account=<Account Name>
Nodes of accounts	mdiag -s	sinfo -a

Table 1.2: System commands for Moab and SLURM

Command Description	OpenMPI	SLURM
Parallel wrapper	mpirun	srun

Table 1.3: System Commands for Parallel Processing with OpenMPI and SLURM

Directive Differences (in Job script or command line)

Notice: There are important differences between SLURM and PBS. Please be careful when using the specifications –ntask= (-n) and –cpus-per-task= (-c) in SLURM because they are not PBS specifications, and there are no CPUs per node or ppn options in SLURM. The number of tasks (-n) is the specified number of parallel processes in the distributed memory (such as MPI model).

Torque Directive	SLURM Directive	Description	Torque Example	SLURM Example
#PBS	#SBATCH	Head of each line	N/A	N/A
-A	-A, –account=<account>	This option tells Torque to use the Account (not the User Name) specified for the credential.	#PBS -A <Account Name>	#SBatch -A <Account Name> #SBATCH –account=<Account Name>
-a	-begin=<time>	This option tells PBS to run a job at the given time	#PBS -a <0001>	#SBATCH –begin=00:01
-e -o -j	-e, –error=<filename pattern> -o, –output=<filename pattern> *NOTE: this command requires a file name pattern, and the file name pattern cannot be a directory. To find the details on valid filename patterns, check the sbatch manual by typing “man sbatch” into a CLI terminal and looking at the filename pattern section of the sbatch manual.	The location for the error_path and output_path attributes. Users only need to use one of these two options if the -j option is also in use. Combining the -e and -o options (to create “eo” or “oe”) will combine the STDOUT and STDERR together into a single file that the user specifies. “eo” will place the information into the error_path file, and “oe” will place the information into the output_path file.	#PBS -e ~/ErrorFile #PBS -j oe #PBS -j eo	#SBATCH -e ~/ErrorFile_%j_%u *NOTE: Both standard output and standard error information for the job are directed into the same file.
qsub -l	salloc srun –pty /bin/bash	Declares that the job is to be run interactively.	qsub -l qsub -l -X	srun –pty /bin/bash salloc –x11
-l	-N, –nodes=<minnodes[-maxnodes]> -n, –ntasks=<number> –ntasks-per-node=<ntasks> -c, –cpus-per-task=<ncpus> –gres=<list> -t, –time=<time> –mem=<size[units]> -C, –constraint=<list> –tmp=<size[units]>	Remember to separate options with a comma ( , ). Nodes=# : gives the number and/or type of nodes desired. ppn=# : gives the number of processors per node desired. gpus=# : gives the number of GPUs desired. walltime= : total runtime desired in the format DD:HH:MM:SS or HH:MM:SS mem= : maximum amount of memory required by the job. feature= : states name of type of compute node required. file= : states the maximum amount of local disk space required by the job.	#PBS -l nodes=5:ppn=2:gpus=3 #PBS -l walltime=01:30:00 #PBS -l mem=5gb #PBS -l feature=intel14\|intel16 #PBS -l file=50GB	#SBATCH -n 5 -c 2 –gres=gpu:3 #SBATCH –time=01:30:00 #SBATCH –mem=5G #SBATCH -C NOAUTO:intel14\|intel16 #SBATCH –tmp=50G
-M	–mail-user=<User Name>	Emails email accounts(s) that are listed to notify users once a job changes states.	#PBS -M <username>@utk.edu	#SBATCH –mail-user=<username>@utk.edu
-m	–mail-type=<type>	a- this option sends a mail notification when the job is aborted. b- this option sends a mail notification when the job execution begins. e- this option sends a mail notification when the job ends. n- this option does not send mail.	#PBS -m abe	#SBATCH –mail-type=FAIL/BEGIN/END if NONE option is used, no mail will be sent.
-N	-J, –job-name=<jobname>	Names a job	#PBS -N <Desired Name of Job>	#SBATCH -J <Desired Name of Job>
-t	-a, –array=<indexes>	Submits an array job with “n” number of identical tasks. Remember: each job that is part of an array job will have the same JOBID, but a different ARRAYID.	#PBS -t 7 #PBS -t 2-13	#SBATCH -a 7 #SBATCH –array=2-13
-V	–export=<environment variables [all] \| none>	Passes all current environmental variables to the job.	#PBS -V	#SBATCH –export=ALL
-v	–export=<environment variables [all] \| none>	Defines any additional environmental variables for the job.	#PBS -v ev1=ph5,ev2=43	#SBATCH –export=’ev1=ph5,ev2=43′
-W	-L, –licenses=<license>	Special Generic Resources (i.e., software licenses) can be requested by using the -W option.	#PBS -W gres:<Name of Software>	#SBATCH -L <name of software>@<specified by license>@<specified by license>

Table 1.4: Job Submission Specification Options

Description	Torque	SLURM Variables
The ID of the job	PBS_JOB ID	SLURM_JOB_ID
Job array ID (index) number	PBS_ARRAYID	SLURM_ARRAY_TASK_ID
Directory where the submission command was executed	PBS_O_WORKDIR	SLURM_SUBMIT_DIR
Name of the job	PBS_JOBNAME	SLURM_JOB_NAME
List of nodes allocated to the job	PBS_NODEFILE	SLURM_JOB_NODELIST
Number of Processors Per Node (ppn) requested	PBS_NUM_PPN	SLURM_JOB_CPUS_PER_NODE
Total number of cores requested	PBS_NP	SLURM_NTASKS* SLURM_CPUS_PER_TASK
Total number of nodes requested	PBS_NUM_NODES	SLURM_NTASKS
Current Host of the PBS job	PBS_O_HOST	SLURM_SUBMIT_HOST

Table 1.5: Environment Variables for Torque and SLURM

The University of Tennessee, Knoxville

Office of Innovative Technologies

SLURM vs. PBS on ISAAC-NG

Command Differences

Directive Differences (in Job script or command line)