High Performance & Scientific Computing
SLURM vs. PBS on ISAAC-NG
The ISAAC Legacy (formerly ACF) and ISAAC-NG computing clusters now both utilize SLURM for workload management to manage and schedule jobs submitted to the clusters.
A variety of example SLURM jobs are available on ISAAC Legacy at /lustre/haven/examples/jobs and on ISAAC NG at /lustre/isaac/examples/jobs.
Command Differences
Command Description | Torque | SLURM |
Batch job submission | qsub <Job File Name> | sbatch <Job File Name> |
Interactive job submission | qsub -l | salloc | srun –pty /bin/bash |
Job list | qstat | squeue -l |
Job list by users | qstat -u <User Name> | squeue -l -u <User Name> |
Job deletion | qdel <Job ID> | scancel <Job ID> |
Job hold | qhold <Job ID> | scontrol hold <Job ID> |
Job release | qrls <Job ID> | scontrol release <Job ID> |
Job update | qalter <Job ID> | scontrol update job <Job ID> |
Job details | qstat -f <Job ID> | scontrol show job <Job ID> |
Node list | pbsnodes -l | sinfo -N |
Node details | pbsnodes | scontrol show nodes |
Command Description | Moab | SLURM |
Job start time | showstart <Job ID> | squeue –start -j <Job ID> |
Status of nodes | mdiag -n | sinfo -N -l |
User’s account | mdiag -u <User Name> | sacctmgr show association user=<User Name> |
Account members | mdiag -a <Account Name> | sacctmgr show assoc account=<Account Name> |
Nodes of accounts | mdiag -s | sinfo -a |
Command Description | OpenMPI | SLURM |
Parallel wrapper | mpirun | srun |
Directive Differences (in Job script or command line)
Notice: There are important differences between SLURM and PBS. Please be careful when using the specifications –ntask= (-n) and –cpus-per-task= (-c) in SLURM because they are not PBS specifications, and there are no CPUs per node or ppn options in SLURM. The number of tasks (-n) is the specified number of parallel processes in the distributed memory (such as MPI model).
Torque Directive | SLURM Directive | Description | Torque Example | SLURM Example |
#PBS | #SBATCH | Head of each line | N/A | N/A |
-A | -A, –account=<account> | This option tells Torque to use the Account (not the User Name) specified for the credential. | #PBS -A <Account Name> | #SBatch -A <Account Name> #SBATCH –account=<Account Name> |
-a | -begin=<time> | This option tells PBS to run a job at the given time | #PBS -a <0001> | #SBATCH –begin=00:01 |
-e -o -j | -e, –error=<filename pattern> -o, –output=<filename pattern> *NOTE: this command requires a file name pattern, and the file name pattern cannot be a directory. To find the details on valid filename patterns, check the sbatch manual by typing “man sbatch” into a CLI terminal and looking at the filename pattern section of the sbatch manual. | The location for the error_path and output_path attributes. Users only need to use one of these two options if the -j option is also in use. Combining the -e and -o options (to create “eo” or “oe”) will combine the STDOUT and STDERR together into a single file that the user specifies. “eo” will place the information into the error_path file, and “oe” will place the information into the output_path file. | #PBS -e ~/ErrorFile #PBS -j oe #PBS -j eo | #SBATCH -e ~/ErrorFile_%j_%u *NOTE: Both standard output and standard error information for the job are directed into the same file. |
qsub -l | salloc srun –pty /bin/bash | Declares that the job is to be run interactively. | qsub -l qsub -l -X | srun –pty /bin/bash salloc –x11 |
-l | -N, –nodes=<minnodes[-maxnodes]> -n, –ntasks=<number> –ntasks-per-node=<ntasks> -c, –cpus-per-task=<ncpus> –gres=<list> -t, –time=<time> –mem=<size[units]> -C, –constraint=<list> –tmp=<size[units]> | Remember to separate options with a comma ( , ). Nodes=# : gives the number and/or type of nodes desired. ppn=# : gives the number of processors per node desired. gpus=# : gives the number of GPUs desired. walltime= : total runtime desired in the format DD:HH:MM:SS or HH:MM:SS mem= : maximum amount of memory required by the job. feature= : states name of type of compute node required. file= : states the maximum amount of local disk space required by the job. | #PBS -l nodes=5:ppn=2:gpus=3 #PBS -l walltime=01:30:00 #PBS -l mem=5gb #PBS -l feature=intel14|intel16 #PBS -l file=50GB | #SBATCH -n 5 -c 2 –gres=gpu:3 #SBATCH –time=01:30:00 #SBATCH –mem=5G #SBATCH -C NOAUTO:intel14|intel16 #SBATCH –tmp=50G |
-M | –mail-user=<User Name> | Emails email accounts(s) that are listed to notify users once a job changes states. | #PBS -M <username>@utk.edu | #SBATCH –mail-user=<username>@utk.edu |
-m | –mail-type=<type> | a- this option sends a mail notification when the job is aborted. b- this option sends a mail notification when the job execution begins. e- this option sends a mail notification when the job ends. n- this option does not send mail. | #PBS -m abe | #SBATCH –mail-type=FAIL/BEGIN/END if NONE option is used, no mail will be sent. |
-N | -J, –job-name=<jobname> | Names a job | #PBS -N <Desired Name of Job> | #SBATCH -J <Desired Name of Job> |
-t | -a, –array=<indexes> | Submits an array job with “n” number of identical tasks. Remember: each job that is part of an array job will have the same JOBID, but a different ARRAYID. | #PBS -t 7 #PBS -t 2-13 | #SBATCH -a 7 #SBATCH –array=2-13 |
-V | –export=<environment variables [all] | none> | Passes all current environmental variables to the job. | #PBS -V | #SBATCH –export=ALL |
-v | –export=<environment variables [all] | none> | Defines any additional environmental variables for the job. | #PBS -v ev1=ph5,ev2=43 | #SBATCH –export=’ev1=ph5,ev2=43′ |
-W | -L, –licenses=<license> | Special Generic Resources (i.e., software licenses) can be requested by using the -W option. | #PBS -W gres:<Name of Software> | #SBATCH -L <name of software>@<specified by license>@<specified by license> |
Description | Torque | SLURM Variables |
The ID of the job | PBS_JOB ID | SLURM_JOB_ID |
Job array ID (index) number | PBS_ARRAYID | SLURM_ARRAY_TASK_ID |
Directory where the submission command was executed | PBS_O_WORKDIR | SLURM_SUBMIT_DIR |
Name of the job | PBS_JOBNAME | SLURM_JOB_NAME |
List of nodes allocated to the job | PBS_NODEFILE | SLURM_JOB_NODELIST |
Number of Processors Per Node (ppn) requested | PBS_NUM_PPN | SLURM_JOB_CPUS_PER_NODE |
Total number of cores requested | PBS_NP | SLURM_NTASKS* SLURM_CPUS_PER_TASK |
Total number of nodes requested | PBS_NUM_NODES | SLURM_NTASKS |
Current Host of the PBS job | PBS_O_HOST | SLURM_SUBMIT_HOST |