High Performance & Scientific Computing

Running Jobs on ISAAC Secure Enclave

General Information

As explained in the Access and Login page, when you log in to Secure Enclave HPSC cluster, you will be directed to one of the login nodes from where you can access the cluster to do your research work. Please note that the login nodes should only be used for basic tasks such as file editing, code compilation, and job submission. Running production jobs on login nodes is highly discouraged and any production job running on the login nodes is subject to termination.

As discussed above, login nodes should not be used to run production jobs. Any kind of production work should be performed on the system’s compute resources. The compute resources of Secure Enclave cluster are managed and allocated by SLURM scheduler (Simple Linux Utility for Resource Management) which uses the partition and Quality of Service (QoS) features to efficiently allocate the resources for different jobs. At the time of writing this document, the available partitions, QoS and maximum runtime for the computational jobs are given in the table below:

Quality of Service (QoS)	Run Time Limit (Hours)	Valid Partitions
campus	168 [7 days]	campus, campus-gpu
condo	720 [30 days]	condo-*

The available partitions and the allowed QoS for the corresponding partitions along with the maximum run times

This page provides information for getting started with the batch facilities of SLURM scheduler as well as the basic job execution. However, before getting started with the job submission, it is imperative to understand how to access different job directories on ISAAC Secure Enclave cluster from where the jobs can be submitted.

Encrypted Project Space Mounting

For sensitive projects that require encryption, EncFS is used to securely store and access data. For additional information on EncFS, see the Arch Linux wiki entry. The Secure Enclave provides tools for simplifying and centrally managing the use of EncFS. To use the secure storage, you must first mount the encrypted folder for your project by following these steps.

Connect to a Secure Enclave login node, if you have not already.
To mount an encrypted project folder, type the following command: sudo /usr/local/bin/sipmount <projectname> (where <projectname> is the project ID whose encrypted space you want to access).
The sudo part of the command will require you to authenticate with your NetID password and Duo TFA. The Secure Enclave recommends you use the “Duo Push” option.
Verify that the project folder was mounted.
1. The df command shows mounted filesystems. The project directory will be mounted at /projects/<projectname>.
2. Type ls –l /projects/<projectname> to list the contents of the project folder.

After 15 minutes of inactivity, the encrypted space will be closed and require you to repeat this process.

Job Directories

By default, SLURM scheduler assumes the working directory to be the one from where the jobs is submitted. It is recommended that the jobs should be submitted from within a directory in the Lustre file system. The following storage spaces are available to running jobs:

/projects/<projectname> – This is the encrypted space for the project under which the job is being run. It is mounted automatically on the head node of the job and is unmounted automatically at job completion. Due to the encryption layer, read and write performance to this space is very poor. It is recommended that jobs requiring many reads or writes utilize the scratch space described next, copying initial data from the secure space to the scratch space at the start, and copying results back into either the secured or unsecured (depending on the nature of the data) spaces before the job completes.
/lustre/sip/scratch/<jobid> – Stored in the SCRATCHDIR environment variable available to a running job, this temporary scratch space is created automatically as the job starts, and is renamed to /lustre/sip/scratch/<jobid>.completed when the job is complete. <jobid> is the full name of the job as reported by qsub when the job was submitted (I.E. 1234.sip-mgmt1). 24 hours after the job completes and the directory is renamed, it will be automatically deleted to help protect any sensitive data that is stored there.
/lustre/sip/proj/ – This is the unsecured scratch space for the project under which the job is being run. You will have your own folder under this space with your username which you can use to store non-sensitive data.

If your project was previously given an encrypted space on the login nodes using the LUKS encryption mechanism, you will need to migrate any data stored there to your EncFS space. The LUKS spaces are deprecated and will eventually be retired. To simplify this process, the sudo /usr/local/bin/sipmount –migrate <projectname> command has been provided for use on the login node containing your project’s LUKS encrypted space. When this command is used, it will do the following.

Mount the LUKS and EncFS space simultaneously.
Report the location of the LUKS space.
Report the location of the EncFS space.
Implore you to migrate data from LUKS to EncFS.

Once the data has been migrated, type /usr/local/bin/sipmount –migrate-complete <projectname> to complete the migration process and close the encrypted spaces.

SLURM Batch Scripts

In this section, we will explain how to request SLRUM scheduler to allocate the resources for a job and submit those jobs.

Partitions

Partition

Slurm scheduler organizes the similar set of nodes and job features into one group called a partition. Each partition features hard limits for maximum wall clock time, job size, upper limit to number of nodes etc. The default partition for all the users is named campus. At present, there are 18 nodes under campus partition or a total of 864 cores.

Quality of Service

A Quality of Service (QoS) is a set of restrictions on resources accessible to a job. You can always get the current list of QoS specifications via

sacctmgr show qos \
format=name,maxwall,maxtres,maxtrespu,maxjobspu,maxjobspa,maxsubmitpa

As of July 1, 2024, below are the QoS limits set for the users to run their jobs on Secure Enclave HPSC cluster:

QoS	Max wall clock time per User	Max Resource Usage Per Job	Max Resource Usage Per Account (across all running jobs under the project)	Max Running/Submitted Jobs Per User
campus	7 days	10 nodes	10 nodes	-/-
condo	30 days	NA	NA	-/-

Scheduling Policy

The Secure Enclave has been divided into logical units known as condos. There are institutional and individual private condos in the Secure Enclave which have associated project accounts. Click here for more information on condos and project accounts. Institutional condos are available for any faculty, staff, or student at the institution. However, individual private condos are available to projects that have invested in the Secure Enclave.

As the Slurm scheduling policy requires the nodes under each condo to be a part of a partition. Therefore, each condo has a unique partition associated with its project account and it is imperative to define the partition and the project account while requesting the resources on Secure Enclave. Note that it is the combination of partition and project account based upon which Slurm scheduler allocates the resources. For example, an Institutional condo has a campus partition associated with its project (SIP-UTK0011). Slurm sets the constraints to a job submitted using these options such as maximum wall time, number of nodes, etc. To submit a job under this partition, use the below command:

 #SBATCH -A SIP-UTK0011
 #SBATCH --partition=campus or #SBATCH -p campus

The detailed information about all the partitions and the respective nodes on Secure Enclave can be viewed using the command:

 $ sinfo -Nel

For more information on the flags used by this command, refer to the Slurm documentation

Please note that in addition to the above two directives, we also need to specify the nodes, cores, and wall time parameters or other optional SBATCH directives. Detailed information on how to submit a job using SBATCH directives is provided under the section Submitting Jobs with Slurm.

Once a job is submitted, the scheduler checks for the available resources and allocates them to the jobs to launch the tasks. At present, the Slurm scheduler is configured to avoid the overlap of nodes allocated to different users. This means that the nodes are not shared among the jobs submitted by different users. However, Slurm can allocate the same node to be shared by multiple jobs for the same user and that the node will only be shared if the total requested resources among all the jobs do not exceed available resources on the node. Note that the users can choose to run a job exclusively on an entire node by using the exclusive flag while calling srun to distribute tasks among different CPUs. Check table 6.3 to see how to use this flag.

The order in which jobs are run depends on the following factors:

number of nodes requested – jobs that request more nodes get a higher priority.
queue wait time – a job’s priority increases along with its queue wait time (not counting blocked jobs as they are not considered “queued.”)
number of jobs – a maximum of ten jobs per user, at a time, will be eligible to run.

Currently, single core jobs by the same user will get scheduled on the same node.

In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact the OIT HelpDesk. They will need the job ID and reason to submit the request.

Slurm Commands/Variables

Slurm Commands

In the table below, we have listed few important Slurm commands used on the login nodes along with their description which are most often used while working with Slurm scheduler

Command	Description
sbatch jobscript.sh	Used to submit the job script to request the resources
squeue	Used to displays the status of all the jobs
squeue -u username	Used to displays the status and other information of user’s all jobs
squeue [jobid]	Display the job status and information of a particular job
scancel	Cancel the job with a jobid
scontrol show jobid/parition value	Yields the information about a job or any resource
scontrol update	Alter the resources of a pending job
salloc	Used to allocate the resources for the interactive job run

Table 6.1: Basic Slurm commands and Variables to submit, monitor and delete the jobs on Secure Enclave

Slurm Variables

Below we have tabulated few important Slurm variables which will be usfuel to the ISAAC users

Variable	Description
SLURM_SUBMIT_DIR	The directory from where the job is submitted
SLURM_JOBID	The job identifier of the submitted job
SLURM_NODELIST	List of nodes allocated to a job
SLURM_NTASKS	Prints the total number of CPUs used

Table 6.2: Different Slurm Variables to get the information of the running job

SBATCH Flags

The jobs on Secure Enclave are submitted using sbatch command which passes the request for the resources requested in the job script to Slurm scheduler. The resources in the job script are requested using the “SBATCH” directive. Note that Slurm accepts SBATCH directives in two formats. Users can choose any format at their own discretion. The description of each of the SBATCH flags is given below:

Flags	Description
#SBATCH `-`J Jobname	Name of the job
#SBATCH `--`account (or `-`A) Project Account	Project account to which the time will be charge
#SBATCH `--`time (or `-`t)=days-hh:mm:ss	Request wall time for the job
#SBATCH `--`nodes (or `-`N)=1	Number of nodes needed
#SBATCH `--`ntasks (or `-`n) = 48	Total number of cores requested
#SBATCH `--`ntasks-per-node = 48	Request number of cores per node
#SBATCH `--`constraint=nosecurespace	Submit job to non-secure queue without authentication
#SBATCH `--`partition (or `-`p) = campus	Selects the partition or queue
#SBATCH `--`output (or `-`o) = Jobname.o%j	The file where output of terminal is dumped
#SBATCH `--`error (`-`e) = Jobname.e%j	The files where run time errors are dumped
#SBATCH `--`exclusive	Allocates the exclusive excess of node(s)
#SBATCH `--`array (`-`a) = index	Used to run multiple jobs with identical parameters
#SBATCH `--`chdir=directory	Used to change the working directory. The default working directory is the one from where a job is submitted

Table 6.3: Different SLURM flags used in creating the job script along with their description

Submitting Jobs with Slurm

On ISAAC Secure Enclave, batch jobs can be submitted in two ways: (i) interactive batch mode (ii) Non-interactive batch mode.

Interactive Batch mode:

Interactive batch jobs give users the interactive access to compute nodes. In this mode, users can request the Slurm scheduler to allocate the resources of compute nodes directly on the terminal. A common use for interactive batch jobs is to debug the calculation or program before submitting the non-interactive batch jobs for production runs. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.

The interactive batch mode can be invoked on the login node by using salloc command followed by the sbatch flags to request the different resources. The different sbatch flags are given in table 1.3.

$ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00
or 
$ salloc -A projectaccount -N 1 -n 1 -p campus -t 01:00:00

The salloc command interprets the user’s request to the Slurm scheduler and requests the resources. In the above command we requested slurm scheduler to allocate one node and one cpu for a total time of 1 hour using campus partition. Note that if salloc command is executed without specifying the resources such as nodes, tasks and clock time, then the scheduler will allocate the default resources which are one processor under campus partition with a wall clock time of 1 hour.

When the scheduler allocates the resources, the user gets a message on the terminal as shown below with the information about the jobid and the hostname of the compute node where the resources are allocated.

 $ salloc --nodes=1 --ntasks=1 --time=01:00:00
  salloc: Granted job allocation 1234
  salloc: Waiting for resource configuration
  salloc: Nodes nodename are ready for job
 $

Once the interactive job starts, the user should change their working directories to lustre project or scratch space to run the computationally intensive applications. To run the parallel executable, we recommend using srun followed by the executable as shown below:

 $ srun executable

Note that you do not need to mention the number of processors before the executable while calling srun. The slurm wrapper “srun” executes your calculations in parallel on the requested number of processors. The serial applications can be run with and without srun.

Non-interactive batch mode:

In this mode, the set of resources as well as the commands for the application to be run are written in a text file called as batch file or batch script. This batch script is submitted to the Slurm scheduler by using the sbatch command. The batch scripts are very useful to run the production jobs. The batch scripts allow the users to work on a cluster non-interactively. In batch jobs, users submit a group of commands to Slurm and check the status and output of the commands from time to time. However, sometimes it is very useful to run a job interactively (primarily for debugging). Click here to check how to run the batch jobs interactively. A typical example of a job script is given below:

 #!/bin/bash
 #This file is a submission script to request the ISAAC resources from Slurm 
 #SBATCH -J job			       #The name of the job
 #SBATCH -A SIP-UTK0011                            # The project account to be charged
 #SBATCH --nodes=1                                      # Number of nodes
 #SBATCH --ntasks-per-node=48                   # cpus per node 
 #SBATCH --partition=campus                     # If not specified then default is "campus"
 #SBATCH --time=0-01:00:00                       # Wall time (days-hh:mm:ss)
 #SBATCH --error=job.e%J		      # The file where run time errors will be dumped
 #SBATCH --output=job.o%J		      # The file where the output of the terminal will be dumped

 # Now list your executable command/commands.
 # Example for code compiled with a software module:
 module load example/test

 hostname
 sleep 100
 srun executable

The above job script can be divided into three sections:

Shell interpreter (one line)
- The first line of the script specifies the script’s interpreter. The syntax of this line is #!/bin/shellname (sh, bash, csh, ksh, zsh)
- This line is important and essential. If not mentioned, then the scheduler will print the error.
SLURM submission options
- The second section contains a bunch of lines starting with ‘#SBATCH’.
- These lines are not the comments.
- #SBATCH is a Slurm directive which communicates information regarding the resources requested by the user in the batch script file.
- #SBATCH options after the first non-comment line are ignored by Slurm scheduler
- The description about each of the flags is mentioned in the table 1.3
- The command sbatch on the terminal is used to submit the non-interactive batch script.
Shell commands
- The shell command follows the last #SBATCH line.
- Set of commands or tasks which a user wants to run. This also includes any software modules which may be needed to access a particular application.
- To run the parallel application, it is recommended to use srun followed by the name of the full path and the name of the executable if the executable path is not loaded into the Slum environment while submitting the script.

For the quick start, we have also provided a collection of complete sample job scripts that are available on Secure Enclave cluster at /lustre/sip/examples/jobs

[

Job Arrays

Slurm offers a useful option of submitting jobs using array flags to the users whose batch jobs require identical resources. Using this flag in the job script, users can submit multiple jobs with with a single sbatch command. Although job script is submitted only once using sbatch command, but the individual jobs in the array are scheduled independently with unique job array identifiers ($SLURM_ARRAY_JOB_ID). Each of the individual jobs can be differentiated using Slurm’s environmental variable $SLURM_ARRAY_TASK_ID. To understand this variable, let us consider an example of a Slurm script given below:

#!/bin/bash
#SBATCH -J myjob
#SBATCH -A SIP-UTK0001
#SBATCH -N 1
#SBATCH --ntasks-per-node=30  ###-ntasks is used when we want to define total number of processors
#SBATCH --time=01:00:00
#SBATCH --partition=campus     #####
##SBATCH -e myjob.e%j   ## Errors will be written in this file
#SBATCH -o myjob%A_%a.out    ## Separate output files will be created for each array. %A will be replaced by jobid and %a will be replaced by array index
#SBATCH --array=1-30
       # Submit array of jobs numbered 1 to 30
###########   Perform some simple commands   ########################
set -x
###########   Below code is used to create 30 script files needed to submit the array of jobs   ###############
for i in {1..30}; do cp sleep_test.sh 'sleep_test'$i'.sh';done

###########   Run your executable   ###############
sh sleep_test$SLURM_ARRAY_TASK_ID.sh

In the above example, we have created 30 sleep_test$index.sh executable files whose names differ by index. We can accomplish this task either by submitting 30 individual jobs or using an efficient and simple method of slurm arrays which takes these files in the form of an array sleep_test[1-30].sh. The variable SLURM_ARRAY_TASK_ID array is set to array index value [1-30], which is defined in the Slurm script above using #SBATCH directive

#SBATCH --array=1-30

The simultaneous number of jobs using a job array can also be limited by using a %n flag along with –array flag. For example: to run only 5 jobs at a time in the Slurm array, users can include the SLURM directive.

#SBATCH --array=1-30%5

In order to create a separate output file for each of the submit jobs using Slurm arrays, use %A and %a, which represents the jobid and job array index as shown in the above example.

Exclusive Access to Nodes

As explained in the Scheduling policy, the jobs submitted by the same user can share the nodes. However, users can request the whole node(s) to run their jobs without sharing them with other jobs. To do that use the below command:

Interactive batch mode:

 $ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00 --exclusive

Non-Interactive batch mode:

Add the below line in your job script

 #SBATCH --exclusive

[

Monitoring the Jobs Status

Users can regularly check the status of their jobs by using the squeue command.

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
              1202    campus     Job3 username PD       0:00      2 (Resources)
              1201    campus     Job1 username  R       0:05      2 node[001-002]
              1200    campus     Job2 username  R       0:10      2 node[004-005]

The description of each of the columns of the output from squeue command is given below

Name of Column	Description
JOBID	The unique identifier of each job
PARTITION	The partition/queue from which the resources are to be allocated to the job
NAME	The name of the job specified in the Slurm script using #SBATCH -J option. If the -J option is not used, Slurm will use the name of the batch script.
USER	The login name of the user submitting the job
ST	Status of the job. Slurm scheduler uses short notation to give the status of the job. The meaning of these short notations is given in the table below.
TIME	The maximum wall time requested by the user for a job
NODES	The requested number of nodes on which the job is running along with the node names if resources are already allocated

Description of the squeue output

When a user submits a job, it passes through various states. The values of these states for a job is given by squeue command under the column ST. The possible values of the job under ST column are given below:

Status Value	Meaning	Description
CG	Completing	Job is about to complete.
PD	Pending	Job is waiting for the resources to be allocated
R	Running	Job is running on the allocated resources
S	Suspended	Job was allocated resources but the execution got suspended due to some problem and CPUs are released for other jobs

Different states of a Slurm job

[

Altering Batch Jobs

The users are allowed to change the attributes of their jobs until the job starts running. In this section, we will describe how to alter your batch jobs with examples.

Remove a Job from Queue

Users can remove the jobs in any state which are submitted by them using the command scancel.

To remove a job with a JOB ID 1234, use the command:

scancel 1234

Modifying the Job Details

Users can make use of the Slurm command scontrol which is used to alter a variety of Slurm parameters. Although most of the commands using scontrol can only be executed by the System Adminstrator. However, users are granted some permissions to use scontrol for its use on the jobs submitted by them provided the jobs are not in the running mode

Release/Hold a job

scontrol release/hold jobid

Modify the name of the job

scontrol update JobID=jobid JobName=any_new_name

Modify the total number of tasks

scontrol update JobID=jobid NumTasks=Total_tasks

Modify the number of CPUs per node

scontrol update JobID=jobid MinCPUsNode=CPUs

Modify the Wall time of the job

scontrol update JobID=jobid TimeLimit=day-hh:mm:ss

Known Issues

When a user successfully authenticates the sipmount or qsub commands, it will report that the user has successfully logged in, even though the user is already logged in. This is due to the authentication mechanism using the same security controls as are used for authenticating a login attempt.

The University of Tennessee, Knoxville

Office of Innovative Technologies

Running Jobs on ISAAC Secure Enclave

General Information

Encrypted Project Space Mounting

Job Directories

SLURM Batch Scripts

Partitions

Partition

Quality of Service

Quality of Service

Scheduling Policy

Scheduling Policy

Slurm Commands/Variables

Slurm Commands

Slurm Variables

SBATCH Flags

SBATCH Flags

Submitting Jobs with Slurm

Submitting Jobs with Slurm

Interactive Batch mode:

Non-interactive batch mode:

Job Arrays

Job Arrays

Exclusive Access to Nodes

Exclusive Access to Nodes

Monitoring the Jobs Status

Monitoring the Jobs Status

Altering Batch Jobs

Altering Batch Jobs

Remove a Job from Queue

Modifying the Job Details

Known Issues