Skip to content Skip to main navigation Report an accessibility issue
High Performance & Scientific Computing

Running Jobs



** We are in the process of transitioning to SLURM scheduler to submit and manage jobs on ISAAC Secure Enclave cluster. The currently used Torque/Moab scheduler will be decommissioned soon. We are working on the documentation to use the new SLURM scheduler. **

General Information

When you log in, you will be directed to one of the login nodes. The login nodes should only be used for basic tasks such as file editing, code compilation, and job submission.

The login nodes should not be used to run production jobs. Production work should be performed on the system’s compute resources. Serial jobs (pre- and post-processing, etc.) may be run on the compute nodes. Access to compute resources is managed by Torque (a PBS-like system). Job scheduling is handled by Moab, which interacts with Torque and system software.

This page provides information for getting started with the batch facilities of Torque with Moab as well as basic job execution. Sometimes you may want to chain your submissions to complete a full simulation without the need to resubmit, you can read about this here.

Encrypted Project Space Mounting

For sensitive projects that require encryption, EncFS is used to securely store and access data. For additional information on EncFS, see the Arch Linux wiki entry. The Secure Enclave provides tools for simplifying and centrally managing the use of EncFS. To use the secure storage, you must first mount the encrypted folder for your project by following these steps.

  1. Connect to a Secure Enclave login node, if you have not already.
  2. To mount an encrypted project folder, type the following command: sudo sipmount <projectname> (where <projectname> is the project ID whose encrypted space you want to access).
  3. The sudo part of the command will require you to authenticate with your NetID password and Duo TFA. The Secure Enclave recommends you use the “Duo Push” option.
  4. Verify that the project folder was mounted.
    1. The df command shows mounted filesystems. The project directory will be mounted at /projects/<projectname>.
    2. Type ls –l /projects/<projectname> to list the contents of the project folder.

After 15 minutes of inactivity, the encrypted space will be closed and require you to repeat this process.

Because the secure space is made available to running jobs (see the Job Directories section for details), it is necessary to prompt for authentication when the user submits a job. When using the qsub command to submit a job, you will be prompted to authenticate using your NetID password and Duo TFA, like when logging in. When submitting multiple jobs simultaneously (such as in a script), a single authentication will be stored for up to 5 minutes (this uses the same mechanism as the sudo command when mounting the secure project storage, so one authentication will be sufficient for both types of commands for the 5-minute window).

Do consider the amount of jobs you intend to submit. It is possible that you may attempt to submit too many jobs in the five-minute period, which will require re-authentication.

In cases where the secure project space is unnecessary for a job, you may submit to jobs to the nosecurespace queue to avoid authentication. This queue does not automatically mount a secured project space and will not require you to authenticate. To run a job in the nosecurespace queue, use a command like the following.

qsub –I –A <projectname> -l nodes=<nodes> -q nosecurespace

In this example, the qsub command is used to submit an interactive job on the nosecurespace queue. Change the and arguments to your own project folder name and node specification. For more information on the qsub command, type man qsub to read through its manual pages or type qsub -? to view the options available to the command.

Job Directories

Jobs should be submitted from within a directory in the Lustre file system. Always execute cd $PBS_O_WORKDIR as the first command in your shell script. An example of this appears in the Batch Script section. Please refer to the PBS Environment Variables section for further details. Documentation that describes PBS options can be used for more complex job scripts. The following storage spaces are available to running jobs:

  • /projects/<projectname> – This is the encrypted space for the project under which the job is being run. It is mounted automatically on the head node of the job and is unmounted automatically at job completion. Due to the encryption layer, read and write performance to this space is very poor. It is recommended that jobs requiring many reads or writes utilize the scratch space described next, copying initial data from the secure space to the scratch space at the start, and copying results back into either the secured or unsecured (depending on the nature of the data) spaces before the job completes.
  • /lustre/sip/scratch/<jobid> – Stored in the SCRATCHDIR environment variable available to a running job, this temporary scratch space is created automatically as the job starts, and is renamed to /lustre/sip/scratch/<jobid>.completed when the job is complete. <jobid> is the full name of the job as reported by qsub when the job was submitted (I.E. 1234.sip-mgmt1). 24 hours after the job completes and the directory is renamed, it will be automatically deleted to help protect any sensitive data that is stored there.
  • /lustre/sip/proj/ – This is the unsecured scratch space for the project under which the job is being run. You will have your own folder under this space with your username which you can use to store non-sensitive data.

If your project was previously given an encrypted space on the login nodes using the LUKS encryption mechanism, you will need to migrate any data stored there to your EncFS space. The LUKS spaces are deprecated and will eventually be retired. To simplify this process, the sudo sipmount –migrate <projectname> command has been provided for use on the login node containing your project’s LUKS encrypted space. When this command is used, it will do the following.

  1.   Mount the LUKS and EncFS space simultaneously.
  2.   Report the location of the LUKS space.
  3.   Report the location of the EncFS space.
  4.   Implore you to migrate data from LUKS to EncFS.

Once the data has been migrated, type sipmount –migrate-complete <projectname> to complete the migration process and close the encrypted spaces.

SLURM Batch Scripts

The ISAAC Secure Enclave cluster is transitioning from Torque/Moab to SLURM scheduler and workload manager to submit, monitor and alter jobs, and distribute them among different compute nodes.  In this section, we will explain how to request SLRUM scheduler to allocate the resources for a job and submit those jobs.

Partition

Slurm scheduler organizes the similar set of nodes into one group called as partition. Each partition features hard limits for maximum wall clock time, job size, upper limit to number of nodes etc. The default partition for all the users is named campus. At present, there are 10 nodes under campus partition or a total of 480 cores.

Scheduling Policy

The Secure Enclave has been divided into logical units known as condos. There are institutional and individual condos in the Secure Enclave which have associated project accounts. Click here for more information on condos and project accounts. Institutional condos are available for any faculty, staff, or student at the institution. However, individual condos are available to projects that have invested in the Secure Enclave.

As the Slurm scheduling policy requires the nodes under each condo to be a part of a partition. Therefore, each condo has a unique partition associated to its project account and it is imperative to define the partition and the project account while requesting the resources on Secure Enclave. Note that it is the combination of partition and project account based upon which Slurm scheduler allocates the resources. For example: Institutional condo has a campus partition associated to its project (SIP-UTK0011). Slurm set the consraints to a job submitted using these options such as maximum walltime, number of nodes etc. To submit a job under this partition, use the below command:

#SBATCH -A SIP-UTK0011
#SBATCH --partition=campus or #SBATCH -p campus

The information about all the partitions on Secure Enclave can be viewed using the command:

$ sinfo -Nel

For more information on the flags used by this command, refer to the Slurm documentation

Please note that in addition to above two directives, we also need to specify the nodes, cores and wall time parameters or other optional SBATCH directives. To get more information on how to submit a job using SBATCH directives, please refer to the section Submitting Jobs with Slurm. We have also provided a collection of complete sample job scripts that are available at /lustre/sip/examples/jobs.

Once a job is submitted, the scheduler checks for the available resources and allocate them to the jobs to launch the tasks. At present, the Slurm scheduler is configured to avoid the overlap of nodes allocated to different users. This means that the nodes are not shared among the jobs submitted by different users. However, Slurm can allocate the same node to be shared by multiple jobs for the same user and that the node will only be shared if the total requested resources among all the jobs does not exceed available resources on the node. Note that the users can choose run a job exclusively on entire node by using the exclusive flag while calling srun to distribute tasks among different CPUs. Check table 1.3 to see how to use this flag.

Additionally, users are granted permission to alter certain attributes of queued jobs until they start running. See the section Altering the Batch Jobs for more details. The order in which jobs are run depends on the following factors:

  • number of nodes requested – jobs that request more nodes get a higher priority.
  • queue wait time – a job’s priority increases along with its queue wait time (not counting blocked jobs as they are not considered “queued.”)
  • number of jobs – a maximum of ten jobs per user, at a time, will be eligible to run.

Currently, single core jobs by the same user will get scheduled on the same node.

In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact the OIT HelpDesk. They will need the job ID and reason to submit the request.

Slurm Commands/Variables

In the table below, we have listed few important Slurm commands used on the login nodes along with their description which are most often used while working with Slurm scheduler

Command Description
sbatch jobscript.sh Used to submit the job script to request the resources
squeue Used to displays the status of all the jobs
squeue -u usernameUsed to displays the status and other information of user’s all jobs
squeue [jobid] Display the job status and information of a particular job
scancel Cancel the job with a jobid
scontrol show jobid/parition valueYields the information about a job or any resource
scontrol update Alter the resources of a pending job
salloc Used to allocate the resources for the interactive job run
Table 1.1: Basic Slurm commands and Variables to submit, monitor and delete the jobs on Secure Enclave

Slurm Variables

Below we have tabulated few important Slurm variables which will be usfuel to the ISAAC users

VariableDescription
SLURM_SUBMIT_DIRThe directory from where the job is submitted
SLURM_JOBIDThe job identifier of the submitted job
SLURM_NODELISTList of nodes allocated to a job
SLURM_NTASKSPrints the total number of CPUs used
Table 1.2: Different Slurm Variables to get the information of the running job

SBATCH Flags

The jobs on Secure Enclave are submitted using sbatch command which passes the request for the resources requested in the job script to Slurm scheduler. The resources in the job script are requested using the “SBATCH” directive. Note that Slurm accept SBATCH directives in two formats. Users can choose any format at their own discretion. The description of each of the SBATCH flags is given below:

FlagsDescription
#SBATCH -J JobnameName of the job
#SBATCH --account (or -A) Project AccountProject account to which the time will be charge
#SBATCH --time (or -t)=days-hh:mm:ssRequest wall time for the job
#SBATCH --nodes (or -N)=1Number of nodes needed
#SBATCH --ntasks (or -n) = 48Total number of cores requested
#SBATCH --ntasks-per-node = 48Request number of cores per node
#SBATCH --constraint=nosecurespaceSubmit job to non-secure queue without authentication
#SBATCH --partition (or -p) = campusSelects the partition or queue
#SBATCH --output (or -o) = Jobname.o%jThe file where output of terminal is dumped
#SBATCH --error (-e) = Jobname.e%jThe files where run time errors are dumped
#SBATCH --exclusiveAllocates the exclusive excess of node(s)
#SBATCH --array (-a) = indexUsed to run multiple jobs with identical parameters
#SBATCH --chdir=directoryUsed to change the working directory. The default working directory is the one from where a job is submitted
Table 1.3: Different SLURM flags used in creating the job script along with their description

Submitting Jobs with Slurm

On ISAAC Secure Enclave, batch jobs can be submitted in two ways: (i) interactive batch mode (ii) Non-interactive batch mode.

Interactive Batch mode:

Interactive batch jobs give users the interactive access to compute nodes. In this mode, user can request the Slurm scheduler to allocate the resources of compute nodes directly on the terminal. A common use for interactive batch jobs is to debug the calculation or program before submitting the non-interactive batch jobs for production runs. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.

The interactive batch mode can be invoked on the login node by using salloc command followed by the sbatch flags to request the different resources. The different sbatch flags are given in table 1.3.

$ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00
or 
$ salloc -A projectaccount -N 1 -n 1 -p campus -t 01:00:00

The salloc command interprets the user’s request to Slurm scheduler and request the resources. In the above command we requested slurm scheduler to allocate one node and one cpu for a total time of 1 hour using campus partition. Note that if salloc command is executed without specifying the resources such as nodes, tasks and clock time, then scheduler will allocate the default resources which are one processor under campus partition with a wall clock time of 1 hour.

When the scheduler allocates the resources, the user gets a message on the terminal as shown below with the information about the jobid and the hostname of the compute node where the resources are allocated.

 $ salloc --nodes=1 --ntasks=1 --time=01:00:00
  salloc: Granted job allocation 1234
  salloc: Waiting for resource configuration
  salloc: Nodes nodename are ready for job
 $

Once the the interactive job starts, the user should change their working directories to lustre project or scratch space to run the computationally intense applications. To run the parallel executable, we recommend using srun followed by the executable as shown below:

 $ srun executable

Note that you do not need to mention the number of processors before the executable while calling srun. The slurm wrapper srun execute your calculations in parallel on the requested number of processors. The serial applications can be run with and without srun.

Non-interactive batch mode:

In this mode, the set of resources as well as the commands for the application to be run are written in a text file called as batch file or batch script. This batch script is submitted to Slurm scheduler by using the sbatch command. The batch scripts are very useful to run the productions jobs. The batch scripts allow the users to work on a cluster non-interactively. In batch jobs, users submit a group of commands to Slurm and checking the status and the output of the commands from time to time. However, sometimes it is very useful to run a job interactively (primarily for debugging). Click here to check how to run the batch jobs interactively. A typical example of a job script is given below:

 #!/bin/bash
 #This file is a submission script to request the ISAAC resources from Slurm 
 #SBATCH -J job			       #The name of the job
 #SBATCH -A SIP-UTK0011                            # The project account to be charged
 #SBATCH --nodes=1                                      # Number of nodes
 #SBATCH --ntasks-per-node=48                   # cpus per node 
 #SBATCH --partition=campus                     # If not specified then default is "campus"
 #SBATCH --time=0-01:00:00                       # Wall time (days-hh:mm:ss)
 #SBATCH --error=job.e%J		      # The file where run time errors will be dumped
 #SBATCH --output=job.o%J		      # The file where the output of the terminal will be dumped

 # Now list your executable command/commands.
 # Example for code compiled with a software module:
 module load example/test

 hostname
 sleep 100
 srun executable

The above job script can be divided into three sections:

  1. Shell interpreter (one line)
    • The first line of the script specifies the script’s interpreter. The syntax of this line is #!/bin/shellname (sh, bash, csh, ksh, zsh)
    • This line is important and essential. If not mentioned, then scheduler will print the error.
  2. SLURM submission options
    • The second section contains a bunch of lines starting with ‘#SBATCH’.
    • These lines are not the comments.
    • #SBATCH is a Slurm directive which communicates information regarding the resources requested by the user in the batch script file.
    • #SBATCH options after the first non-comment line are ignored by Slurm scheduler
    • The description about each of the flags is mentioned in the table 1.3
    • The command sbatch on the terminal is used to submit the non-interactive batch script.
  3. Shell commands
    • The shell command follows the last #SBATCH line.
    • Set of commands or tasks which a user wants to run. This also includes any software modules which may be needed to access a particular application.
    • To run the parallel application, it is recommended to use srun followed by the name of the full path and name of the executable if the executable path is not loaded into Slum environment while submitting the script.

Job Arrays

Slurm offers a useful option of submitting jobs using array flags to the users whose batch jobs require identical resources. Using this flag in the job script, users can submit multiple jobs with with a single sbatch command. Although job script is submitted only once using sbatch command, but the individual jobs in the array are scheduled independently with unique job array identifiers ($SLURM_ARRAY_JOB_ID). Each of the individual jobs can be differentiated using Slurm’s environmental variable $SLURM_ARRAY_TASK_ID. To understand this variable, let us consider an example of a Slurm script given below:

#!/bin/bash
#SBATCH -J myjob
#SBATCH -A SIP-UTK0001
#SBATCH -N 1
#SBATCH --ntasks-per-node=30  ###-ntasks is used when we want to define total number of processors
#SBATCH --time=01:00:00
#SBATCH --partition=campus     #####
##SBATCH -e myjob.e%j   ## Errors will be written in this file
#SBATCH -o myjob%A_%a.out    ## Separate output file will be created for each array. %A will be replaced by jobid and %a will be replaced by array index
#SBATCH --array=1-30
       # Submit array of jobs numbered 1 to 30
###########   Perform some simple commands   ########################
set -x
###########   Below code is used to create 30 script files needed to submit the array of jobs   ###############
for i in {1..30}; do cp sleep_test.sh 'sleep_test'$i'.sh';done

###########   Run your executable   ###############
sh sleep_test$SLURM_ARRAY_TASK_ID.sh

In the above example, we have created 30 sleep_test$index.sh executable files whose names are differing by an index. We can accomplish this task either by submitting 30 individual jobs or using an efficient and simple method of slurm arrays which takes these files in the form of an array sleep_test[1-30].sh. The variable SLURM_ARRAY_TASK_ID array is set to array index value [1-30], which is defined in the Slurm script above using #SBATCH directive

#SBATCH --array=1-30

The simultaneous number of jobs using a job array can also be limited by using a %n flag along with –array flag. For example: to run only 5 jobs at a time in the Slurm array, users can incluse the SLURM directive

#SBATCH --array=1-30%5

In order to create a separate output file for each of the submit jobs using Slurm arrays, use %A and %a, which represents the jobid and job array index as shown in the above example.

Exclusive Access to Nodes

As explained in the Scheduling policy, the jobs submitted by the same user can share the nodes. However, users can request the whole node(s) to run their jobs without sharing them with other jobs. To do that use the below command:

Interactive batch mode:

 $ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00 --exclusive

Non-Interactive batch mode:

Add the below line in your job script

 #SBATCH --exclusive

Monitoring the Jobs Status

Users can regularly check status of their jobs by using the squeue command.

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1202    campus     Job3 username PD       0:00      2 (Resources)
              1201    campus     Job1 username  R       0:05      2 node[001-002]
              1200    campus     Job2 username  R       0:10      2 node[004-005]

The description of each of the columns of the output from squeue command is given below

Name of Column Description
JOBIDThe unique identifier of each job
PARTITIONThe partition/queue from which the resources are to be allocated to the job
NAMEThe name of the job specified in the Slurm script using #SBATCH -J option. If the -J option is not used, Slurm will use the name of the batch script.
USERThe login name of the user submitting the job
STStatus of the job. Slurm scheduler uses short notation to give the status of the job. The meaning of these short notations is given in the table below.
TIMEThe maximum wall time requested by the user for a job
NODESThe requested number of nodes on which the job is running along with the node names if resources are already allocated
Description of the squeue output

When a user submits a job, it passes through various states. The values of these states for a job is given by squeue command under the column ST. The possible values of the job under ST column are given below:

Status ValueMeaningDescription
CG CompletingJob is about to complete.
PDPendingJob is waiting for the resources to be allocated
RRunningJob is running on the allocated resources
SSuspendedJob was allocated resources but the execution got suspended due to some problem and CPUs are released for other jobs
Different states of a Slurm job

Altering Batch Jobs

The users are allowed to change the attributes of their jobs until the job starts running. In this section, we will describe how to alter your batch jobs with examples.

Remove a Job from Queue

User can remove the jobs in any state which are submitted by them using the command scancel.

To remove a job with a JOB ID 1234, use the command:

scancel 1234

Modifying the Job Details

Users can make use of the Slurm command scontrol which is used to alter a variety of Slurm parameters. Although most of the commands using scontrol can only be executed by the System Adminstrator. However, users are granted some permissions to use scontrol for its use on the jobs submitted by them provided the jobs are not in the running mode

Release/Hold a job

scontrol release/hold jobid

Modify the name of the job

scontrol update JobID=jobid JobName=any_new_name

Modify the total number of tasks

scontrol update JobID=jobid NumTasks=Total_tasks

Modify the number of CPUs per node

scontrol update JobID=jobid MinCPUsNode=CPUs

Modify the Wall time of the job

scontrol update JobID=jobid TimeLimit=day-hh:mm:ss

Torque Batch Scripts

Batch scripts can be used to run a set of commands on a system’s compute partition. Batch scripts allow users to run non-interactive batch jobs, which are useful for submitting a group of commands, allowing them to run through the queue, and then viewing the results. However It is sometimes useful to run a job interactively (primarily for debugging purposes). Please refer to the Interactive Batch Jobs section for more information on how to run batch jobs interactively.

All non-interactive jobs must be submitted on Beacon using job scripts via the qsub command. The batch script is a shell script containing PBS flags and commands to be interpreted by a shell. The batch script is submitted to the resource manager, Torque, where it is parsed. Based on the parsed data, Torque places the script in the queue as a job. Once the job makes its way through the queue, the script will be executed on the head node of the allocated resources.

Jobs are submitted to the batch job scheduler in units of nodes via the -l nodes=# option. By default, MPI jobs will place one task per node. The default behavior can be overridden by adding the ‘-ppn=# -f $PBS_NODEFILE’ option to the mpirun command. Nodes can be oversubscribed (i.e. utilizing more MPI ranks than the node has cores); however, the default behavior will be to fill all cores on all nodes before adding the additional MPI ranks. This will be done by adding ranks to each node again up to the number of cores per node available. This process is repeated until all MPI ranks have been allocated. For example a job that requests 3 nodes (-l nodes=3) that have 16 total cores available and submits an MPI job using 144 ranks (mpirun -n 144) will first place 16 MPI ranks on each node on each of the 3 nodes (48 ranks over 3 nodes) before placing an addition set of 48 ranks in the same way (16 ranks per node over 3 nodes). Finally, the remaining set of 48 ranks will be allocated to all the nodes in the same way (16 ranks per node over 3 nodes).

If all MPI ranks have not been allocated it will place this same number of MPI ranks starting again on each node, starting with the first, until all MPI Ranks have been allocated. In cases were the number of MPI ranks per node is less than the available cores per node, these MPI ranks are evenly spread across processor cores. For example if 8 MPI ranks are placed on a 16 core node (2 processors of 8 cores each) then four MPI ranks will land on the first processor and the four MPI ranks will land on the second processor.

All job scripts start with an interpreter line, followed by a series of #PBS declarations that describe requirements of the job to the scheduler. The rest is a shell script, which sets up and runs the executable.

Batch scripts are divided into the following three sections:

  1. Shell interpreter (one line)
    • The first line of a script can be used to specify the script’s interpreter.
    • This line is optional.
    • If not used, the submitter’s default shell will be used.
    • The line uses the syntax #!/path/to/shell, where the path to the shell may be
      • /usr/bin/csh
      • /usr/bin/ksh
      • /bin/bash
      • /bin/sh
  2. PBS submission options
    • The PBS submission options are preceded by #PBS, making them appear as comments to a shell.
    • PBS will look for #PBS options in a batch script from the script’s first line through the first non-comment line. A comment line begins with #.
    • #PBS options entered after the first non-comment line will not be read by PBS.
  3. Shell commands
    • The shell commands follow the last #PBS option and represent the executable content of the batch job.
    • If any #PBS lines follow executable statements, they will be treated as comments only. The exception to this rule is shell specification on the first line of the script.
    • The execution section of a script will be interpreted by a shell and can contain multiple lines of executables, shell commands, and comments.
    • During normal execution, the batch script will end and exit the queue after the last line of the script.

The following examples show typical job script header with various mpirun commands to submit a parallel job that executes ./a.out on 3 nodes with a wall clock limit of two hours:

#PBS -S /bin/bash
#PBS -A ACF-UTK0011
#PBS -l nodes=3,walltime=02:00:00

cd $PBS_O_WORKDIR

Option 1:
mpirun -n 48 ./a.out    
Places 48 MPI ranks (16 per node, placed 1 per node round robin)

Option 2:
mpirun -n 96 ./a.out   
Places 96 MPI ranks (32 ranks per node).  

Option 3:
mpirun -n 96 -ppn=32 -f $PBS_NODEFILE  ./a.out
Places 96 MPI ranks (32 ranks per node ).  

Option 4:
mpirun -n 24 -ppn=8 -f $PBS_NODEFILE  ./a.out
Places 24 MPI ranks (8 per node in groups of 8).  Ranks 0-7 will be on node 1, Ranks 8-15 will be on node 2, and Ranks 16-23 will be on node 3.

Jobs should be submitted from within a directory in the Lustre file system. It is best to always execute cd $PBS_O_WORKDIR as the first command. Please refer to the PBS Environment Variables section for further details.

Documentation that describes PBS options can be used for more complex job scripts.

Unless otherwise specified, your default shell interpreter will be used to execute shell commands in job scripts. If the job script should use a different interpreter, then specify the correct interpreter using:

 #PBS -S /bin/XXXX

Altering Batch Jobs

This section shows how to remove or alter batch jobs.

Remove Batch Job from the Queue

Jobs in the queue in any state can be stopped and removed from the queue using the command qdel.

For example, to remove a job with a PBS ID of 1234, use the following command:

> qdel 1234

More details on the qdel utility can be found on the qdel man page.

Hold Queued Job

Jobs in the queue in a non-running state may be placed on hold using the qhold command. Jobs placed on hold will not be removed from the queue, but they will not be eligible for execution.

For example, to move a currently queued job with a PBS ID of 1234 to a hold state, use the following command:

> qhold 1234

More details on the qhold utility can be found on the qhold man page.

Release Held Job

Once on hold the job will not be eligible to run until it is released to return to a queued state. The qrls command can be used to remove a job from the held state.

For example, to release job 1234 from a held state, use the following command:

> qrls 1234

More details on the qrls utility can be found on the qrls man page.

Modify Job Details

Non-running (or on hold) jobs can only be modified with the qalter PBS command. For example, this command can be used to:

Modify the job´s name,

$ qalter -N <newname> <jobid>

Modify the number of requested nodes,

$ qalter -l nodes=<NumNodes> <jobid>

Modify the job´s wall time

$ qalter -l walltime=<hh:mm:ss> <jobid>

Set job´s dependencies

$ qalter -W  depend=type:argument <jobid>

Remove a job´s dependency (omit :argument):

$ qalter -W  depend=type <jobid>

Notes:

  • Use qstat -f <jobid> to gather all the information about a job, including job dependencies.
  • Use qstat -a <jobid> to verify the changes afterward.
  • Users cannot specify a new walltime for their job that exceeds the maximum walltime of the queue where your job is.
  • If you need to modify a running job, please contact us. Certain alterations can only be performed by administrators.

Interactive Batch Jobs

Interactive batch jobs give users interactive access to compute resources. A common use for interactive batch jobs is debugging. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.

Users are not allowed to run interactive jobs from login nodes. Running a batch-interactive PBS job is done by using the -I option with qsub. After the interactive job starts, the user should run the computationally intense applications on the lustre scratch space, and place the executable after the mpirun command.

Interactive Batch Example

For interactive batch jobs, PBS options are passed through qsub on the command line. Refer to the following example:

qsub -I -A UT-NTNL0121 -l nodes=1,walltime=1:00:00
OptionDescription
-IStart an interactive session
-ACharge to the “UT-NTNL0121” project
-lRequest 1 physical compute node (16 cores) for one hour

After running this command, you will have to wait until enough compute nodes are available, just as in any other batch job. However, once the job starts, the standard input and standard output of this terminal will be linked directly to the head node of our allocated resource. The executable should be placed on the same line after the mpirun command, just like it is in the batch script.


> cd /lustre/medusa/$USER
> mpirun -n 16 ./a.out

Issuing the exit command will end the interactive job.

Common PBS Options

This section gives a quick overview of common PBS options.

Necessary PBS options

OptionUseDescription
A#PBS -A <account>Causes the job time to be charged to <account>. The account string is typically composed of three letters followed by three digits and optionally followed by a subproject identifier. The utility showusage can be used to list your valid assigned project ID(s). This is the only option required by all jobs.
l#PBS -l nodes=<nodes>Number of requested nodes.
 #PBS -l walltime=<time>Maximum wall-clock time. <time> is in the format HH:MM:SS. Default is 1 hour.

Other PBS Options

OptionUseDescription
o#PBS -o <name>Writes standard output to <name> instead of <job script>.o$PBS_JOBID. $PBS_JOBID is an environment variable created by PBS that contains the PBS job identifier.
e#PBS -e <name>Writes standard error to <name> instead of <job script>.e$PBS_JOBID.
j#PBS -j {oe,eo}Combines standard output and standard error into the standard error file (eo) or the standard out file (oe).
m#PBS -m aSends email to the submitter when the job aborts.
 #PBS -m bSends email to the submitter when the job begins.
 #PBS -m eSends email to the submitter when the job ends.
M#PBS -M <address>Specifies email address to use for -m options.
N#PBS -N <name>Sets the job name to <name> instead of the name of the job script.
S#PBS -S <shell>Sets the shell to interpret the job script.
qos#PBS -q <queue>Directs the job to the run under the specified QoS. This option is not required to run in the default QoS.
l#PBS -l feature=<feature>Select the desired node feature set.

Note:  Please do not use the PBS -V option. This can propagate large numbers of environment variable settings from the submitting shell into a job which may cause problems for the batch environment. Instead of using PBS -V, please pass only necessary environment variables using -v <comma_separated_list_of_ needed_envars>. You can also include module load statements in the job script.

Example:

#PBS -v PATH,LD_LIBRARY_PATH,PV_NCPUS,PV_LOGIN,PV_LOGIN_PORT

Further details and other PBS options may be found using the man qsub command.

PBS Environment Variables

This section gives a quick overview of useful environment variable sets within PBS jobs.

  • PBS_O_WORKDIR
    • PBS sets the environment variable PBS_O_WORKDIR to the directory from which the batch job was submitted.
    • By default, a job starts in your home directory. Often, you would want to do cd $PBS_O_WORKDIR to move back to the directory you were in. The current working directory when you start mpirun should be on Lustre Space.

Include the following command in your script if you want it to start in the submission directory:

cd $PBS_O_WORKDIR
  • PBS_JOBID
    • PBS sets the environment variable PBS_JOBID to the job’s ID.
    • A common use for PBS_JOBID is to append the job’s ID to the standard output and error file(s).

Include the following command in your script to append the job’s ID to the standard output and error file(s)

#PBS -o scriptname.o$PBS_JOBID
  • PBS_NNODES
    • PBS sets the environment variable PBS_NNODES to the number of logical cores requested (not nodes). Given that Beacon has 16 physical cores per node, the number of nodes would be given by $PBS_NNODES/16.
    • For example, a standard MPI program is generally started with mpirun -n $($PBS_NNODES) ./a.out. See the Job Execution section for more details.

Monitoring Job Status

The below describes some ways to monitor jobs in the Secure Enclave batch environment. The Torque resource manager and Moab scheduler provide multiple tools to view the queues, batch system, job status, and scheduler information. Below are the most common and useful of these tools.

qstat

Use qstat -a to check the status of submitted jobs.

> qstat -a 
ocoee.nics.utk.edu: 
Job ID               Username    Queue    Jobname          SessID NDS   TSK    Memory   Time     S  Time
-----------------  -----------     --------   ----------------   ------    -----   ------  ------        --------   -  --------
102903              lucio       batch    STDIN              9317    --       16       --            01:00:00 C 00:06:17
102904              lucio       batch    STDIN              9590    --       16       --            01:00:00 R      -- 
>

The qstat output shows the following:

Job IDThe first column gives the PBS-assigned job ID.
UsernameThe second column gives the submitting user’s login name.
QueueThe third column gives the queue into which the job has been submitted.
JobnameThe fourth column gives the PBS job name. This is specified by the PBS -N option in the PBS batch script. Or, if the -N option is not used, PBS will use the name of the batch script.
SessIDThe fifth column gives the associated session ID.
NDSThe sixth column gives the PBS node count. Not accurate; will be one.
TasksThe seventh column gives the number of logical cores requested by the job’s -size option.
Req’d MemoryThe eighth column gives the job’s requested memory.
Req’d TimeThe ninth column gives the job’s requested wall time.
SThe tenth column gives the job’s current status. See the status listings below.
Elap TimeThe eleventh column gives the job’s time spent in a running status. If a job is not currently or has not been in a run state, the field will be blank.

The job’s current status is reported by the qstat command. The possible values are listed in the table below.

Status valueMeaning
EExiting after having run
HHeld
QQueued
RRunning
SSuspended
TBeing moved to new location
WWaiting for its execution time
CRecently completed (within the last 5 minutes)

showq

The Moab showq utility shows the schedulers view of jobs in the queue. The utility shows the state of jobs from the schedulers point of view including:

RunningThese jobs are currently running.
IdleThese jobs are currently queued awaiting to be assigned resources by the scheduler. A user is allowed five jobs in the Idle state to be considered by the Moab scheduler.
BlockedBlocked jobs are those that are ineligible to be considered by the scheduler. Common reasons for jobs in this state are jobs that the specified resources are not available or the user or system has put a hold on the job.
BatchHoldThese jobs are currently in the queue but are on hold from being considered by the scheduler usually because the requested resources are not available in the system or because the resource manager has repeatedly failed in attempts to start the job.

checkjob

The Moab checkjob utility can be used to view details of a job in the queue. For example, if job 736 is currently in a blocked state, the following can be used to view the reason:

> checkjob 736

The return may contain a line similar to the following:

BLOCK MSG: job 736 violates idle HARD MAXIJOB limit of 5 for user <your_username>  partition ALL (Req: 1  InUse: 5) 

This line indicates the job is in the blocked state because the owning user has reached the limit of five jobs currently in the eligible state.

showstart

The Moab showstart utility gives an estimate of when the job will start.

> showstart 100315
job 100315 requires 16384 procs for 00:40:00

Estimated Rsv based start in 15:26:41 on Fri Sep 26 23:41:12
Estimated Rsv based completion in 16:06:41 on Sat Sep 27 00:21:12

The start time may change dramatically as new jobs with higher priority are submitted, so you need to periodically rerun the command.

showbf

The Moab showbf utility gives the current backfill. This can help you create a job which can be backfilled immediately. As such, it is primarily useful for short jobs.

Scheduling Policy

The Secure Enclave uses TORQUE as the resource manager and Moab as the scheduler to schedule jobs. The Secure Enclave has been divided into logical units known as condos. There are institutional and individual condos in the Secure Enclave. Institutional condos are available for any faculty, staff, or student at that institution. Individual condos are available to projects that have invested in the Secure Enclave.

The scheduler gives preference to large core count jobs. Moab is configured to do “first fit” backfill. Backfilling allows smaller, shorter jobs to use otherwise idle resources.

Users can alter certain attributes of queued jobs until they start running. The order in which jobs are run depends on the following factors:

  • number of nodes requested – jobs that request more nodes get a higher priority.
  • queue wait time – a job’s priority increases along with its queue wait time (not counting blocked jobs as they are not considered “queued.”)
  • number of jobs – a maximum of ten jobs per user, at a time, will be eligible to run.

Currently, single core jobs by the same user will get scheduled on the same node. Users on the same project can share nodes with written permission of the PI.

In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact the OIT HelpDesk. They will need the job ID and reason to submit the request.

Known Issues

When a user successfully authenticates the sipmount or qsub commands, it will report that the user has successfully logged in, even though the user is already logged in. This is due to the authentication mechanism using the same security controls as are used for authenticating a login attempt.