Skip to content Skip to main navigation Report an accessibility issue
High Performance & Scientific Computing

Running Jobs on ISAAC-NG



**The compute resources of ISAAC-NG are managed using SLURM scheduler and workload manager**

Introduction

ISAAC-NG cluster offers access to its compute resources via two means: (i) submission of a SLURM job via the command line or (ii) Web based job submission via Open OnDemand (See the ISAAC-NG web page about Open OnDemand for further details). The compute resources on ISAAC-NG cluster are managed by SLURM workload manager. This webpage describes how to efficiently use the compute resources of ISAAC-NG via command line and submit, monitor, and manage the computational jobs for production work using the SLURM scheduler.

A collection of sample jobs scripts are available at /lustre/isaac/examples/jobs.

Please note: The login nodes should ONLY be used for basic tasks such as: file editing, file management, code compilation, and job submission.

General Information

As explained in the Access and Login page, when you log in to ISAAC-NG HPSC cluster, you will be directed to one of the login nodes from where you can access the cluster to do your research work. Please note that the login nodes should only be used for basic tasks such as file editing, code compilation, and job submission. Running production jobs on login nodes is highly discouraged and any production job running on the login nodes is subject to termination.

If a user is not sure how to determine whether their job(s) are running on a login node, the user is encouraged to reach out to the High Performance & Scientific Computing (HPSC) group for assistance by submitting a help request ticket using the “Submit HPSC Service Request” button at the top right corner of any of the HPSC website pages at oit.utk.edu/hpsc.

As discussed above, login nodes should not be used to run production jobs. Any kind of production work should be performed on the system’s compute resources. The compute resources of ISAAC-NG cluster are managed and allocated by SLURM scheduler (Simple Linux Utility for Resource Management) which uses several partitions to efficiently allocate the resources for different jobs.

This page provides information for getting started with the batch facilities of SLURM scheduler as well as the basic job execution. However, before getting started with the job submission, it is imperative to understand how to access different job directories on ISAAC-NG cluster from where the jobs can be submitted.

Job Directories

By default, SLURM scheduler assumes the working directory to be the one from where the jobs is submitted. It is recommended that the jobs should be submitted from within a directory in the Lustre file system. . The other storage space is located in /lustre/isaac/projThe following storage spaces are available for users to choose from when deciding which directory from which they would like to run their job(s):

  • /lustre/isaac/proj/<project-name> – This is the project directory for the project under which the job is being run. Users can create their own directory under this project directory with their username where data can be stored. This storage space is allocated for a specific project and is made available upon request by the Principle Investigator of the project.
  • /lustre/isaac/scratch/<netid> – All ISAAC-NG users have access to a scratch directory, which can be accessed by specifying the path or using the $SCRATCHDIR environmental variable.

Please refer to File Systems page for more details on home, project, and scratch directories.

SLURM Batch Scripts

In this section, we will explain how to request SLURM scheduler to allocate the resources for a job, and how to submit those jobs.

Partitions

SLURM scheduler organizes the similar sets of nodes and job features into individual groups, and these groups are individually referred to as a “partition”. Each partition has hard limits for the maximum amount of wall clock time, maximum job size, and the upper limit to number of nodes a user is able to request, etc. The partitions available for all ISAAC-NG users include “campus”, “campus-gpu”, “utk-campus-bigmem”, and “campus-gpu-bigmem” under which a number of CPU and GPU cores are available. The default partition is “campus”; in order to use any other partition, your SLURM job must explicitly request it. For more information on available resources under each partition, please refer to our System Overview page.

Scheduling Policy

ISAAC-NG cluster uses a variety of mechanisms to determine how to schedule jobs. Most of the resource request mechanisms can be manipulated by users to ensure that their jobs are scheduled and executed in a reasonable time period.

Condos

The ISAAC-NG computing cluster has been divided into logical units referred to as “condos”. In the context of the cluster, a condo is a logical group of compute nodes that act as an independent cluster to effectively schedule jobs. There are “institutional” and individual “private” condos within the ISAAC-NG cluster. All users belong to their respective institutional condo, whether that be UTK or UTHSC. Private condos are owned and used by individual investors and their associated projects (discussed below). With a private condo, the investor and his project have exclusive use of the compute nodes in the condo. Investors can choose to share their private condo with the entire cluster under the Responsible Node Sharing implementation, but this is not a requirement. In Slurm, each condo is implemented as a “partition”. Most private condos also have a corresponding “overflow” partition, which combine nodes from institutional and private condos, allowing investors the option of running larger jobs than can be run either on type of condo alone. Private condos will typically be named condo-<project-pi-netid>, but you can always get a current list of private condos on the system via the command:
sinfo -Nel | grep condo | grep -v overflow

Project Accounts

A project is the combination of an account used in the batch system and is related to a Linux group used on the system for file access control. In the cluster’s condo-based scheduling model projects and their associated accounts in the batch system are a key component. Projects control access to institutional and private condos. For UTK institutional users, the default project account is ACF-UTK0011. For UTHSC institutional users, the default project is ACF-UTHSC0001. Other projects follow the same project identifier format. To determine which projects you belong, login to the User Portal and look under the “Projects” header. Always ensure you use the correct project account when submitting jobs with a SLURM directive. The following is an example of specifying to use the UTK institutional project account through SLURM directive SBATCH in a job script:

#SBATCH -A ACF-UTK0011

Note that the SLURM utility’s scheduling policy requires that the nodes under each condo have to be a part of a partition. Therefore, each condo has a unique partition associated with its own project account; this makes it imperative for users to define their desired partition and a specific project account while requesting the resources on the ISAAC-NG computing cluster.For example: the institutional condo has a campus partition associated with its project account (ACF-UTK0011). Once a user submits a job to the ISAAC-NG cluster, the SLURM scheduler sets the processing constraints on the submitted job such as maximum wall time, number of nodes required, etc. and schedules the job.

Quality of Service (QOS)

A Quality of Service (QoS) is a set of restrictions on resources accessible to a job. You can always get the current list of QoS specifications via

sacctmgr show qos \
format=name,maxwall,maxtres,maxtrespu,maxjobspu,maxjobspa,maxsubmitpa

As of August 4 2022, these QoS limits available to users are as follows:

QoSMax Wall Time Per JobMax Resource Usage Per Job Max Resource Usage Per User (across all running jobs)Max Running/Submitted Jobs Per User Max Running/Submitted Jobs in Queue Per Account
campus1 day6 Nodes6 Nodes48/96-/-
campus-gpu 1 day2 Nodes 2 GPUs3/6-/-
campus-bigmem1 day1 Node1 Node4/8-/-
condo30 daysNone (defaults to size of condo)None150/250-/-
genomics7 days1 Node1 Node2/3-/-
long6 days96 cores12/18-/-
long-gpu6 days1 Node1 GPU1/3-/-
long-bigmem2/4-/-
overflow3 days12 Nodes-/-1/3
condo-hicap-/500-/-

Job Specifications

Each partition has a list of valid accounts which may submit jobs to that partition, and each user has a list of valid account/QoS combinations they may specify. For example, to submit a job under the campus partition, use the below batch header:

 #SBATCH --account ACF-UTK0011 (or #SBATCH -A ACF-UTK0011)
 #SBATCH --partition=campus (or #SBATCH -p campus)
 #SBATCH --qos=campus (or #SBATCH -q condo)

To submit a GPU job with <n> GPUs to the campus-gpu partition, use the below batch header:

#SBATCH --account ACF-UTK<project-number>
#SBATCH --partition=campus-gpu
#SBATCH --qos=utk
#SBATCH --gpus=<n>

To submit a job under a private condo, use the below batch header:

#SBATCH --account ACF-UTK<project-number>
#SBATCH --partition=condo-<condo-name>
#SBATCH --qos=condo

As users can see, the first command tells SLURM which project account the job is to be associated with, the second command tells SLURM to which partition the job should be submitted, and the third what Quality of Service the user wants to run the job under. Remember, Slurm needs all three pieces of information (the project account name, the partition name, and the Quality of Service) in order to appropriately assign resources to the job a user has submitted, and the schedule that job to run on the ISAAC-NG cluster’s computing resources.

Important Note: It is the combination of a partition, QoS, and project account which the Slurm scheduler program uses to allocate computing resources from the cluster for each job. If a user does not appropriately specify all three, the Slurm utility may not be able to process the user’s job as submitted due to a lack of information from the user.

Likewise, if a user attempts to specify an account or QoS that’s not valid for a partition, or attempts to request an invalid account/qos combination for their netid, their job will be rejected. Detailed information about all the partitions and the respective nodes on ISAAC-NG can be viewed using the following command:

 $ sinfo -Nel

For more information on the flags used by commands, please refer to the Slurm documentation. There are a handful of partitions that will be returned by the above command which are not documented below. These are reserved for administrative purposes, and end users should ignore them.

You can always see the valid accounts for a partition with

$ scontrol show PartitionName=<partition-name> | grep AllowAccounts

You can see the valid QoS settings for a partition with

$ scontrol show PartitionName=<partition-name> | grep AllowQos

You can always see your netid’s valid account/qos combinations with

$ sacctmgr -p show assoc | grep <netid>

However, this is not usually necessary. As a general rule, the valid QoS and accounts combinations for partitions will be as follows:

Condo(s)Allowed QoSAllowed Account(s)
campuscampusAll
campus-bigmemcampus-bigmemAll
campus-gpu, campus-gpu-bigmemcampus-gpuAll
condo-<project-pi-netid>condoProject Accounts Only
condo-<project-pi-netid>-overflowoverflowProject Accounts Only
condo-ut-genomicsgenomicsProject Accounts Only
longlongAll
long-bigmemlong-bigmemAll
long-gpulong-gpuAll

The general association between allowed accounts and QoS will be as follows:

Account TypeExample Account(s)Valid QoS
Institutional AccountACF-UTK0011,
ACF-UTHSC0001
campus, campus-bigmem, campus-gpu, long, long-bigmem, long-gpu
Private/Project/Class AccountACF-UTK<nnnn>,
ACF-UTHSC<nnnn>,
ISAAC-UTK<nnnn>
All institutional QoS plus condo and overflow
Genomics AccountAs for private accountAll instutional QoS plus genomics

You can always determine your current active accounts or request access to new accounts in the user portal.

Please note that in addition to above two directives (project account and partition), users also need to specify the type of nodes, number of nodes, number of cores, and estimated maximum amount of wall time as parameters or other optional SBATCH directives when submitting a job script. A collection of complete sample jobs scripts are also available at /lustre/isaac/examples/jobs. Detailed information on how to submit a job using SBATCH directives is provided under the section Submitting Jobs with Slurm.

Once a job is submitted, the Slurm scheduler checks for the available resources and allocate them to the jobs to launch the tasks.

Important Note: users can choose to run a job exclusively on entire node by using the exclusive flag while also using srun to distribute tasks among different CPUs. Check table 3.3 to see how to use this flag.

The order in which jobs are run depends on the following factors:

  • Number of nodes requested – jobs that request more nodes get a higher priority.
  • Queue wait time – a job’s priority increases along with its queue wait time (not counting blocked jobs as they are not considered “queued.”)
  • Number of jobs – a maximum of ten jobs per user, at a time, will be eligible to run.

In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact the OIT HelpDesk. They will need the job ID and reason to submit the request.

Slurm Commands/Variables

In the table below, are listed few important Slurm commands (with a description of the command) which are most often used while working with Slurm scheduler, and can be used on the ISAAC-NG login nodes.

CommandDescription
sbatch jobscript.shUsed to submit the job script to request the resources
squeueUsed to displays the status of all the jobs
squeue -u usernameUsed to displays the status and other information of user’s all jobs
squeue [jobid]Display the job status and information of a particular job
scancelCancel the job with a jobid
scontrol show jobid/parition valueYields the information about a job or any resource
scontrol updateAlter the resources of a pending job
sallocUsed to allocate the resources for the interactive job run

Slurm Variables

Below are tabulated few important Slurm variables which ISAAC-NG users may find useful.

VariableDescription
SLURM_SUBMIT_DIRThe directory from where the job is submitted
SLURM_JOBIDThe job identifier of the submitted job
SLURM_NODELISTList of nodes allocated to a job
SLURM_NTASKSPrints the total number of CPUs used

SBATCH Flags

The jobs a user submits to the ISAAC-NG cluster are submitted using a sbatch command which passes the request for the resources a user has requested in the job script to the Slurm scheduler. The resources in the job script are requested using the “SBATCH” directive. Note that SLURM scheduler can accept SBATCH directives in two formats. Users can choose to use either of the two formats at their own discretion. The description of each of the SBATCH flags is given below:

Flags Description
#SBATCH -J JobnameName of the job
#SBATCH --account (or -A) Project AccountProject account to which the time will be charge
#SBATCH --time (or -t)=days-hh:mm:ssRequest wall time for the job
#SBATCH --nodes (or -N)=1Number of nodes needed
#SBATCH --ntasks (or -n) = 48Total number of cores requested
#SBATCH --ntasks-per-node = 48Request number of cores per node
#SBATCH --partition (or -p) = campusSelects the partition or queue
#SBATCH --output (or -o) = Jobname.o%jThe file where output of terminal is dumped
#SBATCH --error (-e) = Jobname.e%jThe files where run time errors are dumped
#SBATCH --exclusiveAllocates the exclusive excess of node(s)
#SBATCH --array (-a) = indexUsed to run multiple jobs with identical parameters
#SBATCH --chdir=directoryUsed to change the working directory. The default working directory is the one from where a job is submitted
#SBATCH – -qos=campusThe Quality of Service level for the job.

Submitting Jobs with Slurm

On ISAAC-NG, batch jobs can be submitted in two ways: (i) interactive batch mode, and (ii) non-interactive batch mode.

Interactive Batch mode

Interactive batch jobs give users the interactive access to compute nodes. In interactive batch mode, users can request that the Slurm scheduler allocate the resources of compute nodes, directly on the terminal (e.g., using a terminal’s command line interface (CLI)). A common use for interactive batch jobs is to debug a calculation or a program before submitting non-interactive batch jobs for production runs. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.

Interactive batch mode can be invoked on an ISAAC-NG login node by using the salloc command followed by specific SBATCH flags to request specific resources to which the user would like access. The different SBATCH flags are given in table 3.3.

$ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00
or
$ salloc -A projectaccount -N 1 -n 1 -p campus -t 01:00:00

The salloc command interprets the user’s request to the SLURM scheduler, and as a result, allows the user to request the resources. In the example command above, we requested that the Slurm scheduler allocate one node and one CPU core for a total time of 1 hour using the “campus” partition. Note that if the salloc command is executed without specifying the resources (i.e., number of nodes, number of CPUs, tasks, and wall clock time etc.), then the Slurm scheduler will allocate the default resources.

Important Note: The default resources on ISAAC-NG are: one processor (1 CPU on 1 node) located under the campus partition, with a maximum wall clock time of 1 hour.

When the Slurm scheduler allocates the resources, the user who submitted the job gets a message on the terminal (as shown below) containing the information about the jobid and the hostname of the compute node where the resources are allocated. Notice that in the below example the jobid is “1234” and the hostname of the compute node is “nodename.”

 $ salloc --nodes=1 --ntasks=1 --time=01:00:00 --partition=campus
  salloc: Granted job allocation 1234
  salloc: Waiting for resource configuration
  salloc: Nodes nodename are ready for job
 $

Once the the interactive job starts, the user need to ssh to the allocated node or one of the allocated nodes (for the jobs requesting more than one node) and should change their working directories to a Lustre file system project space or scratch space to run the user’s computationally intense applications. It is best to run large user jobs in Lustre scratch space rather than Lustre project space, because there is no size limit to the amount of data that a user may place on Lustre scratch space; however, there is a limit to how much data that a user may place on Lustre project space. Please visit File System for more information. If users have questions about how to best run their interactive jobs, users are welcome to send their questions to HPSC staf by submitting HPSC Service Request.

To run the parallel executable, we recommend using srun followed by the executable as shown below:

 $ srun executable
or
 $ srun -n <nprocs> executable

Important Note: Users do not necessarily need to include the number of processors they will require, before running the parallel executable while calling srun. The Slurm wrapper srun will execute your calculations in parallel on the number of processors requested in the user’s job script. Serial applications (that is, non-parallel applications) can be run both with and without srun.

Non-interactive batch mode

In non-interactive batch mode, the set of resources as well as the commands for the application to be run are written in a text file, which is referred to as a “batch file” or “batch script”. A user’s “batch script” for their particular job is submitted to the Slurm scheduler by using the sbatch command. Batch scripts are very useful for users who want to run productions jobs. This is because batch scripts allow users to work on a cluster non-interactively. In batch jobs, users submit a group of commands to Slurm and then simply check the job’s status and the output of the commands from time to time. However, sometimes it is very useful to run a job interactively (primarily for debugging). Click here to check how to run the batch jobs interactively. A typical example of a non-interactive batch script/job script is given below:

 #!/bin/bash
 #This file is a submission script to request the ISAAC resources from Slurm 
 #SBATCH -J job			       #The name of the job
 #SBATCH -A ACF-UTK0011              # The project account to be charged
 #SBATCH --nodes=1                     # Number of nodes
 #SBATCH --ntasks-per-node=48          # cpus per node 
 #SBATCH --partition=campus            # If not specified then default is "campus"
 #SBATCH --time=0-01:00:00             # Wall time (days-hh:mm:ss)
 #SBATCH --error=job.e%J	       # The file where run time errors will be dumped
 #SBATCH --output=job.o%J	       # The file where the output of the terminal will be dumped
 #SBATCH --qos=campus

 # Now list your executable command/commands.
 # Example for code compiled with a software module:
 module load example/test

 hostname
 sleep 100
 srun executable

Stuff

The above job script can be divided into three sections:

  1. Shell interpreter (one line)
    • The first line of the script specifies the script’s interpreter. The syntax of this line is #!/bin/shellname (sh, bash, csh, ksh, zsh)
    • This line is important and essential. If not mentioned, then scheduler will print an error.
  2. SLURM submission options
    • The second section contains a bunch of lines starting with ‘#SBATCH’.
    • These lines are not the comments, but by format start with ‘#’.
    • #SBATCH is a Slurm directive which communicates information regarding the resources requested by the user in the batch script file.
    • #SBATCH options after the first non-comment line are ignored by Slurm scheduler
    • The description about each of the flags is mentioned in the table 3.3
    • The command sbatch on the terminal is used to submit the non-interactive batch script.
  3. Shell commands
    • The shell command follows the last #SBATCH line.
    • These commands are a set of commands or tasks which a user wants to run. This also includes any software modules which may be needed to access a particular application.
    • To run the parallel application, it is recommended to use srun followed by the name of the full path and name of the executable if the executable path is not loaded into Slum environment while submitting the script.

For the quick start, we have also provided a collection of complete sample job scripts that are available on the ISAAC-NG cluster at /lustre/isaac/examples/jobs

Job Arrays

Slurm offers a useful option of submitting jobs that use array flags, for users whose batch jobs require identical resources. Using the array flag in a job script, users can submit multiple jobs with with a single sbatch command. Although the job script is submitted only once using sbatch command, the individual jobs in the array (of jobs) are scheduled independently with unique job array identifiers ($SLURM_ARRAY_JOB_ID). Each of the individual jobs can be differentiated using Slurm’s environmental variable $SLURM_ARRAY_TASK_ID. To understand this variable, let us consider an example of a Slurm script given below:

#!/bin/bash
#SBATCH -J myjob
#SBATCH -A ACF-UTK0011
#SBATCH -N 1
#SBATCH --ntasks-per-node=30  ###-ntasks is used when we want to define total number of processors
#SBATCH --time=01:00:00
#SBATCH --partition=campus     #####
##SBATCH -e myjob.e%j   ## Errors will be written in this file
#SBATCH -o myjob%A_%a.out    ## Separate output file will be created for each array. %A will be replaced by jobid and %a will be replaced by array index
#SBATCH --qos=campus
#SBATCH --array=1-30
       # Submit array of jobs numbered 1 to 30
###########   Perform some simple commands   ########################
set -x
###########   Below code is used to create 30 script files needed to submit the array of jobs   ###############
for i in {1..30}; do cp sleep_test.sh 'sleep_test'$i'.sh';done

###########   Run your executable   ###############
sh sleep_test$SLURM_ARRAY_TASK_ID.sh

In the above example, 30 sleep_test$index.sh executable files were created, and these jobs have names are differentiated by an index. One understands that the 30 executable files can be run by either submitting 30 individual jobs or by using Slurm arrays. A Slurm array would take these files in the form of an array named sleep_test[1-30].sh and execute these files. The variable SLURM_ARRAY_TASK_ID is set to array index value [between 1 and 30], with 30 being the total number of jobs in the array in this example, and the index value is defined in the Slurm script (above) using the #SBATCH directive

#SBATCH --array=1-30

The simultaneous number of jobs using a job array can also be limited by using a %n flag along with –array flag. For example: To run only 5 jobs at a time in a Slurm array, users can include the SLURM directive

#SBATCH --array=1-30%5

In order to create a separate output file for each of the submit jobs using Slurm arrays, use %A and %a, which represents the jobid and job array index as shown in the above example.

Exclusive Access to Nodes

As explained in the Scheduling Policy, jobs that are submitted by the same user can share computational nodes. However, if desired, users can request in their job scripts that whole node(s) be reserved to run their jobs without sharing them with other jobs. To do that use the below command:

Interactive batch mode:

 $ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00 --exclusive --qos=campus

Non-Interactive batch mode:

Add the below line in your job script

 #SBATCH --exclusive

Monitoring Job Status

Users can regularly check status of their jobs by using the squeue command.

$ squeue -u <netID>
              JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1202    campus     Job3 username PD       0:00      2 (Resources)
              1201    campus     Job1 username  R       0:05      2 node[001-002]
              1200    campus     Job2 username  R       0:10      2 node[004-005]

The description of each of the columns of the output from squeue command is given below:

Name of ColumnDescription
JOBIDThe unique identifier of each job
PARTITIONThe partition/queue from which the resources are to be allocated to the job
NAMEThe name of the job specified in the Slurm script using #SBATCH -J option. If the -J option is not used, Slurm will use the name of the batch script.
USERThe login name of the user submitting the job
STStatus of the job. Slurm scheduler uses short notation to give the status of the job. The meaning of these short notations is given in the table below.
TIMEThe maximum wall time requested by the user for a job
NODESThe requested number of nodes on which the job is running along with the node names if resources are already allocated

When a user submits a job, it passes through various states. The values of these states for a job is given by squeue command under the column ST. The possible values of the job under ST column are given below:

Status ValueMeaning Description
CGCompletingJob is about to complete.
PDPendingJob is waiting for the resources to be allocated
RRunningJob is running on the allocated resources
SSuspendedJob was allocated resources but the execution got suspended due to some problem and CPUs are released for other jobs
NFNode FailureJob terminated due to failure of one or more allocated nodes

Altering Batch Jobs

The users are allowed to change the attributes of their jobs until the job starts running. In this section, we will describe how to alter your batch jobs with examples.

Remove a Job from Queue

User can remove the jobs in any state which are submitted by them using the command scancel.

To remove a job with a JOB ID 1234, use the command:

scancel 1234

Modifying the Job Details

Users can make use of the Slurm command scontrol which is used to alter a variety of Slurm parameters. Most of the commands using scontrol can only be executed by an ISAAC-NG System Adminstrator; however, users are granted some permissions to use scontrol for use on the jobs they have submitted, provided the jobs are not in the running mode.

Release/Hold a job

scontrol release/hold jobid

Modify the name of the job

scontrol update JobID=jobid JobName=any_new_name

Modify the total number of tasks

scontrol update JobID=jobid NumTasks=Total_tasks

Modify the number of CPUs per node

scontrol update JobID=jobid MinCPUsNode=CPUs

Modify the wall time of the job

scontrol update JobID=jobid TimeLimit=day-hh:mm:ss