SLURM vs. PBS
The new ISAAC-NG computing cluster is housed at the University of Tennessee’s Kingston Pike Building (KPB) in Knoxville, Tennessee and utilizes SLURM (Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management) to manage and schedule jobs submitted to the cluster.
ISAAC-NG users should be aware that Slurm functions in a slightly different way than Torque/Moab function, and commands to accomplish the same tasks will differ depending on which scheduler is in use on a particular cluster.
CCommand Differences
Command Description | Torque | SLURM |
Batch job submission | qsub <Job File Name> | sbatch <Job File Name> |
Interactive job submission | qsub -l | salloc | srun –pty /bin/bash |
Job list | qstat | squeue -l |
Job list by users | qstat -u <User Name> | squeue -l -u <User Name> |
Job deletion | qdel <Job ID> | scancel <Job ID> |
Job hold | qhold <Job ID> | scontrol hold <Job ID> |
Job release | qrls <Job ID> | scontrol release <Job ID> |
Job update | qalter <Job ID> | scontrol update job <Job ID> |
Job details | qstat -f <Job ID> | scontrol show job <Job ID> |
Node list | pbsnodes -l | sinfo -N |
Node details | pbsnodes | scontrol show nodes |
Stuff
Command Description | Moab | SLURM |
Job start time | showstart <Job ID> | squeue –start -j <Job ID> |
Status of nodes | mdiag -n | sinfo -N -l |
User’s account | mdiag -u <User Name> | sacctmgr show association user=<User Name> |
Account members | mdiag -a <Account Name> | sacctmgr show assoc account=<Account Name> |
Nodes of accounts | mdiag -s | sinfo -a |
Stuff
Command Description | OpenMPI | SLURM |
Parallel wrapper | mpirun | srun |
StuffDirective Differences (in Job script or command line)
Notice: There are important differences between SLURM and PBS. Please be careful when using the specifications –ntask= (-n) and –cpus-per-task= (-c) in SLURM because they are not PBS specifications, and there are no CPUs per node or ppn options in SLURM. The number of tasks (-n) is the specified number of parallel processes in the distributed memory (such as MPI model).
Torque Directive | SLURM Directive | Description | Torque Example | SLURM Example |
#PBS | #SBATCH | Head of each line | N/A | N/A |
-A | -A, –account=<account> | This option tells Torque to use the Account (not the User Name) specified for the credential. | #PBS -A <Account Name> | #SBatch -A <Account Name> #SBATCH –account=<Account Name> |
-a | -begin=<time> | This option tells PBS to run a job at the given time | #PBS -a <0001> | #SBATCH –begin=00:01 |
-e -o -j | -e, –error=<filename pattern> -o, –output=<filename pattern> *NOTE: this command requires a file name pattern, and the file name pattern cannot be a directory. To find the details on valid filename patterns, check the sbatch manual by typing “man sbatch” into a CLI terminal and looking at the filename pattern section of the sbatch manual. | The location for the error_path and output_path attributes. Users only need to use one of these two options if the -j option is also in use. Combining the -e and -o options (to create “eo” or “oe”) will combine the STDOUT and STDERR together into a single file that the user specifies. “eo” will place the information into the error_path file, and “oe” will place the information into the output_path file. | #PBS -e ~/ErrorFile #PBS -j oe #PBS -j eo | #SBATCH -e ~/ErrorFile_%j_%u *NOTE: Both standard output and standard error information for the job are directed into the same file. |
qsub -l | salloc srun –pty /bin/bash | Declares that the job is to be run interactively. | qsub -l qsub -l -X | srun –pty /bin/bash salloc –x11 |
-l | -N, –nodes=<minnodes[-maxnodes]> -n, –ntasks=<number> –ntasks-per-node=<ntasks> -c, –cpus-per-task=<ncpus> –gres=<list> -t, –time=<time> –mem=<size[units]> -C, –constraint=<list> –tmp=<size[units]> | Remember to separate options with a comma ( , ). Nodes=# : gives the number and/or type of nodes desired. ppn=# : gives the number of processors per node desired. gpus=# : gives the number of GPUs desired. walltime= : total runtime desired in the format DD:HH:MM:SS or HH:MM:SS mem= : maximum amount of memory required by the job. feature= : states name of type of compute node required. file= : states the maximum amount of local disk space required by the job. | #PBS -l nodes=5:ppn=2:gpus=3 #PBS -l walltime=01:30:00 #PBS -l mem=5gb #PBS -l feature=intel14|intel16 #PBS -l file=50GB | #SBATCH -n 5 -c 2 –gres=gpu:3 #SBATCH –time=01:30:00 #SBATCH –mem=5G #SBATCH -C NOAUTO:intel14|intel16 #SBATCH –tmp=50G |
-M | –mail-user=<User Name> | Emails email accounts(s) that are listed to notify users once a job changes states. | #PBS -M <username>@utk.edu | #SBATCH –mail-user=<username>@utk.edu |
-m | –mail-type=<type> | a- this option sends a mail notification when the job is aborted. b- this option sends a mail notification when the job execution begins. e- this option sends a mail notification when the job ends. n- this option does not send mail. | #PBS -m abe | #SBATCH –mail-type=FAIL/BEGIN/END if NONE option is used, no mail will be sent. |
-N | -J, –job-name=<jobname> | Names a job | #PBS -N <Desired Name of Job> | #SBATCH -J <Desired Name of Job> |
-t | -a, –array=<indexes> | Submits an array job with “n” number of identical tasks. Remember: each job that is part of an array job will have the same JOBID, but a different ARRAYID. | #PBS -t 7 #PBS -t 2-13 | #SBATCH -a 7 #SBATCH –array=2-13 |
-V | –export=<environment variables [all] | none> | Passes all current environmental variables to the job. | #PBS -V | #SBATCH –export=ALL |
-v | –export=<environment variables [all] | none> | Defines any additional environmental variables for the job. | #PBS -v ev1=ph5,ev2=43 | #SBATCH –export=’ev1=ph5,ev2=43′ |
-W | -L, –licenses=<license> | Special Generic Resources (i.e., software licenses) can be requested by using the -W option. | #PBS -W gres:<Name of Software> | #SBATCH -L <name of software>@<specified by license>@<specified by license> |
Stuff
Description | Torque | SLURM Variables |
The ID of the job | PBS_JOB ID | SLURM_JOB_ID |
Job array ID (index) number | PBS_ARRAYID | SLURM_ARRAY_TASK_ID |
Directory where the submission command was executed | PBS_O_WORKDIR | SLURM_SUBMIT_DIR |
Name of the job | PBS_JOBNAME | SLURM_JOB_NAME |
List of nodes allocated to the job | PBS_NODEFILE | SLURM_JOB_NODELIST |
Number of Processors Per Node (ppn) requested | PBS_NUM_PPN | SLURM_JOB_CPUS_PER_NODE |
Total number of cores requested | PBS_NP | SLURM_NTASKS* SLURM_CPUS_PER_TASK |
Total number of nodes requested | PBS_NUM_NODES | SLURM_NTASKS |
Current Host of the PBS job | PBS_O_HOST | SLURM_SUBMIT_HOST |