Running Jobs on ISAAC Secure Enclave
General Information
As explained in the Access and Login page, when you log in to Secure Enclave HPSC cluster, you will be directed to one of the login nodes from where you can access the cluster to do your research work. Please note that the login nodes should only be used for basic tasks such as file editing, code compilation, and job submission. Running production jobs on login nodes is highly discouraged and any production job running on the login nodes is subject to termination.
As discussed above, login nodes should not be used to run production jobs. Any kind of production work should be performed on the system’s compute resources. The compute resources of Secure Enclave cluster are managed and allocated by SLURM scheduler (Simple Linux Utility for Resource Management) which uses the partition and Quality of Service (QoS) features to efficiently allocate the resources for different jobs. At the time of writing this document, the available partitions, QoS and maximum runtime for the computational jobs are given in the table below:
Quality of Service (QoS) | Run Time Limit (Hours) | Valid Partitions |
campus | 168 [7 days] | campus, campus-gpu |
condo | 720 [30 days] | condo-* |
This page provides information for getting started with the batch facilities of SLURM scheduler as well as the basic job execution. However, before getting started with the job submission, it is imperative to understand how to access different job directories on ISAAC Secure Enclave cluster from where the jobs can be submitted.
Encrypted Project Space Mounting
For sensitive projects that require encryption, EncFS is used to securely store and access data. For additional information on EncFS, see the Arch Linux wiki entry. The Secure Enclave provides tools for simplifying and centrally managing the use of EncFS. To use the secure storage, you must first mount the encrypted folder for your project by following these steps.
- Connect to a Secure Enclave login node, if you have not already.
- To mount an encrypted project folder, type the following command: sudo /usr/local/bin/sipmount <projectname> (where <projectname> is the project ID whose encrypted space you want to access).
- The sudo part of the command will require you to authenticate with your NetID password and Duo TFA. The Secure Enclave recommends you use the “Duo Push” option.
- Verify that the project folder was mounted.
- The df command shows mounted filesystems. The project directory will be mounted at /projects/<projectname>.
- Type ls –l /projects/<projectname> to list the contents of the project folder.
After 15 minutes of inactivity, the encrypted space will be closed and require you to repeat this process.
Job Directories
By default, SLURM scheduler assumes the working directory to be the one from where the jobs is submitted. It is recommended that the jobs should be submitted from within a directory in the Lustre file system. The following storage spaces are available to running jobs:
- /projects/<projectname> – This is the encrypted space for the project under which the job is being run. It is mounted automatically on the head node of the job and is unmounted automatically at job completion. Due to the encryption layer, read and write performance to this space is very poor. It is recommended that jobs requiring many reads or writes utilize the scratch space described next, copying initial data from the secure space to the scratch space at the start, and copying results back into either the secured or unsecured (depending on the nature of the data) spaces before the job completes.
- /lustre/sip/scratch/<jobid> – Stored in the SCRATCHDIR environment variable available to a running job, this temporary scratch space is created automatically as the job starts, and is renamed to /lustre/sip/scratch/<jobid>.completed when the job is complete. <jobid> is the full name of the job as reported by qsub when the job was submitted (I.E. 1234.sip-mgmt1). 24 hours after the job completes and the directory is renamed, it will be automatically deleted to help protect any sensitive data that is stored there.
- /lustre/sip/proj/ – This is the unsecured scratch space for the project under which the job is being run. You will have your own folder under this space with your username which you can use to store non-sensitive data.
If your project was previously given an encrypted space on the login nodes using the LUKS encryption mechanism, you will need to migrate any data stored there to your EncFS space. The LUKS spaces are deprecated and will eventually be retired. To simplify this process, the sudo /usr/local/bin/sipmount –migrate <projectname> command has been provided for use on the login node containing your project’s LUKS encrypted space. When this command is used, it will do the following.
- Mount the LUKS and EncFS space simultaneously.
- Report the location of the LUKS space.
- Report the location of the EncFS space.
- Implore you to migrate data from LUKS to EncFS.
Once the data has been migrated, type /usr/local/bin/sipmount –migrate-complete <projectname> to complete the migration process and close the encrypted spaces.
SLURM Batch Scripts
In this section, we will explain how to request SLRUM scheduler to allocate the resources for a job and submit those jobs.
Partition
Slurm scheduler organizes the similar set of nodes and job features into one group called a partition. Each partition features hard limits for maximum wall clock time, job size, upper limit to number of nodes etc. The default partition for all the users is named campus. At present, there are 18 nodes under campus partition or a total of 864 cores.
Quality of Service
A Quality of Service (QoS) is a set of restrictions on resources accessible to a job. You can always get the current list of QoS specifications via
sacctmgr show qos \
format=name,maxwall,maxtres,maxtrespu,maxjobspu,maxjobspa,maxsubmitpa
As of July 1, 2024, below are the QoS limits set for the users to run their jobs on Secure Enclave HPSC cluster:
QoS | Max wall clock time per User | Max Resource Usage Per Job | Max Resource Usage Per User (across all running jobs) | Max Running/Submitted Jobs Per User |
campus | 7 days | 6 nodes | 6 nodes | -/- |
condo | 30 days | NA | NA | -/- |
Scheduling Policy
The Secure Enclave has been divided into logical units known as condos. There are institutional and individual private condos in the Secure Enclave which have associated project accounts. Click here for more information on condos and project accounts. Institutional condos are available for any faculty, staff, or student at the institution. However, individual private condos are available to projects that have invested in the Secure Enclave.
As the Slurm scheduling policy requires the nodes under each condo to be a part of a partition. Therefore, each condo has a unique partition associated with its project account and it is imperative to define the partition and the project account while requesting the resources on Secure Enclave. Note that it is the combination of partition and project account based upon which Slurm scheduler allocates the resources. For example, an Institutional condo has a campus partition associated with its project (SIP-UTK0011). Slurm sets the constraints to a job submitted using these options such as maximum wall time, number of nodes, etc. To submit a job under this partition, use the below command:
#SBATCH -A SIP-UTK0011
#SBATCH --partition=campus or #SBATCH -p campus
The detailed information about all the partitions and the respective nodes on Secure Enclave can be viewed using the command:
$ sinfo -Nel
For more information on the flags used by this command, refer to the Slurm documentation
Please note that in addition to the above two directives, we also need to specify the nodes, cores, and wall time parameters or other optional SBATCH directives. Detailed information on how to submit a job using SBATCH directives is provided under the section Submitting Jobs with Slurm.
Once a job is submitted, the scheduler checks for the available resources and allocates them to the jobs to launch the tasks. At present, the Slurm scheduler is configured to avoid the overlap of nodes allocated to different users. This means that the nodes are not shared among the jobs submitted by different users. However, Slurm can allocate the same node to be shared by multiple jobs for the same user and that the node will only be shared if the total requested resources among all the jobs do not exceed available resources on the node. Note that the users can choose to run a job exclusively on an entire node by using the exclusive flag while calling srun to distribute tasks among different CPUs. Check table 6.3 to see how to use this flag.
The order in which jobs are run depends on the following factors:
- number of nodes requested – jobs that request more nodes get a higher priority.
- queue wait time – a job’s priority increases along with its queue wait time (not counting blocked jobs as they are not considered “queued.”)
- number of jobs – a maximum of ten jobs per user, at a time, will be eligible to run.
Currently, single core jobs by the same user will get scheduled on the same node.
In certain special cases, the priority of a job may be manually increased upon request. To request priority change you may contact the OIT HelpDesk. They will need the job ID and reason to submit the request.
Slurm Commands
In the table below, we have listed few important Slurm commands used on the login nodes along with their description which are most often used while working with Slurm scheduler
Command | Description |
sbatch jobscript.sh | Used to submit the job script to request the resources |
squeue | Used to displays the status of all the jobs |
squeue -u username | Used to displays the status and other information of user’s all jobs |
squeue [jobid] | Display the job status and information of a particular job |
scancel | Cancel the job with a jobid |
scontrol show jobid/parition value | Yields the information about a job or any resource |
scontrol update | Alter the resources of a pending job |
salloc | Used to allocate the resources for the interactive job run |
Slurm Variables
Below we have tabulated few important Slurm variables which will be usfuel to the ISAAC users
Variable | Description |
SLURM_SUBMIT_DIR | The directory from where the job is submitted |
SLURM_JOBID | The job identifier of the submitted job |
SLURM_NODELIST | List of nodes allocated to a job |
SLURM_NTASKS | Prints the total number of CPUs used |
SBATCH Flags
The jobs on Secure Enclave are submitted using sbatch command which passes the request for the resources requested in the job script to Slurm scheduler. The resources in the job script are requested using the “SBATCH” directive. Note that Slurm accepts SBATCH directives in two formats. Users can choose any format at their own discretion. The description of each of the SBATCH flags is given below:
Flags | Description |
#SBATCH - J Jobname | Name of the job |
#SBATCH -- account (or - A) Project Account | Project account to which the time will be charge |
#SBATCH -- time (or - t)=days-hh:mm:ss | Request wall time for the job |
#SBATCH -- nodes (or - N)=1 | Number of nodes needed |
#SBATCH -- ntasks (or - n) = 48 | Total number of cores requested |
#SBATCH -- ntasks-per-node = 48 | Request number of cores per node |
#SBATCH -- constraint=nosecurespace | Submit job to non-secure queue without authentication |
#SBATCH -- partition (or - p) = campus | Selects the partition or queue |
#SBATCH -- output (or - o) = Jobname.o%j | The file where output of terminal is dumped |
#SBATCH -- error (- e) = Jobname.e%j | The files where run time errors are dumped |
#SBATCH -- exclusive | Allocates the exclusive excess of node(s) |
#SBATCH -- array (- a) = index | Used to run multiple jobs with identical parameters |
#SBATCH -- chdir=directory | Used to change the working directory. The default working directory is the one from where a job is submitted |
Submitting Jobs with Slurm
On ISAAC Secure Enclave, batch jobs can be submitted in two ways: (i) interactive batch mode (ii) Non-interactive batch mode.
Interactive Batch mode:
Interactive batch jobs give users the interactive access to compute nodes. In this mode, users can request the Slurm scheduler to allocate the resources of compute nodes directly on the terminal. A common use for interactive batch jobs is to debug the calculation or program before submitting the non-interactive batch jobs for production runs. This section demonstrates how to run interactive jobs through the batch system and provides common usage tips.
The interactive batch mode can be invoked on the login node by using salloc command followed by the sbatch flags to request the different resources. The different sbatch flags are given in table 1.3.
$ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00 or $ salloc -A projectaccount -N 1 -n 1 -p campus -t 01:00:00
The salloc command interprets the user’s request to the Slurm scheduler and requests the resources. In the above command we requested slurm scheduler to allocate one node and one cpu for a total time of 1 hour using campus partition. Note that if salloc command is executed without specifying the resources such as nodes, tasks and clock time, then the scheduler will allocate the default resources which are one processor under campus partition with a wall clock time of 1 hour.
When the scheduler allocates the resources, the user gets a message on the terminal as shown below with the information about the jobid and the hostname of the compute node where the resources are allocated.
$ salloc --nodes=1 --ntasks=1 --time=01:00:00
salloc: Granted job allocation 1234
salloc: Waiting for resource configuration
salloc: Nodes nodename are ready for job
$
Once the interactive job starts, the user should change their working directories to lustre project or scratch space to run the computationally intensive applications. To run the parallel executable, we recommend using srun followed by the executable as shown below:
$ srun executable
Note that you do not need to mention the number of processors before the executable while calling srun. The slurm wrapper “srun” executes your calculations in parallel on the requested number of processors. The serial applications can be run with and without srun.
Non-interactive batch mode:
In this mode, the set of resources as well as the commands for the application to be run are written in a text file called as batch file or batch script. This batch script is submitted to the Slurm scheduler by using the sbatch command. The batch scripts are very useful to run the production jobs. The batch scripts allow the users to work on a cluster non-interactively. In batch jobs, users submit a group of commands to Slurm and check the status and output of the commands from time to time. However, sometimes it is very useful to run a job interactively (primarily for debugging). Click here to check how to run the batch jobs interactively. A typical example of a job script is given below:
#!/bin/bash
#This file is a submission script to request the ISAAC resources from Slurm
#SBATCH -J job #The name of the job
#SBATCH -A SIP-UTK0011 # The project account to be charged
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks-per-node=48 # cpus per node
#SBATCH --partition=campus # If not specified then default is "campus"
#SBATCH --time=0-01:00:00 # Wall time (days-hh:mm:ss)
#SBATCH --error=job.e%J # The file where run time errors will be dumped
#SBATCH --output=job.o%J # The file where the output of the terminal will be dumped
# Now list your executable command/commands.
# Example for code compiled with a software module:
module load example/test
hostname
sleep 100
srun executable
The above job script can be divided into three sections:
- Shell interpreter (one line)
- The first line of the script specifies the script’s interpreter. The syntax of this line is #!/bin/shellname (sh, bash, csh, ksh, zsh)
- This line is important and essential. If not mentioned, then the scheduler will print the error.
- SLURM submission options
- The second section contains a bunch of lines starting with ‘#SBATCH’.
- These lines are not the comments.
- #SBATCH is a Slurm directive which communicates information regarding the resources requested by the user in the batch script file.
- #SBATCH options after the first non-comment line are ignored by Slurm scheduler
- The description about each of the flags is mentioned in the table 1.3
- The command sbatch on the terminal is used to submit the non-interactive batch script.
- Shell commands
- The shell command follows the last #SBATCH line.
- Set of commands or tasks which a user wants to run. This also includes any software modules which may be needed to access a particular application.
- To run the parallel application, it is recommended to use srun followed by the name of the full path and the name of the executable if the executable path is not loaded into the Slum environment while submitting the script.
For the quick start, we have also provided a collection of complete sample job scripts that are available on Secure Enclave cluster at /lustre/sip/examples/jobs
[Job Arrays
Slurm offers a useful option of submitting jobs using array flags to the users whose batch jobs require identical resources. Using this flag in the job script, users can submit multiple jobs with with a single sbatch command. Although job script is submitted only once using sbatch command, but the individual jobs in the array are scheduled independently with unique job array identifiers ($SLURM_ARRAY_JOB_ID). Each of the individual jobs can be differentiated using Slurm’s environmental variable $SLURM_ARRAY_TASK_ID. To understand this variable, let us consider an example of a Slurm script given below:
#!/bin/bash
#SBATCH -J myjob
#SBATCH -A SIP-UTK0001
#SBATCH -N 1
#SBATCH --ntasks-per-node=30 ###-ntasks is used when we want to define total number of processors
#SBATCH --time=01:00:00
#SBATCH --partition=campus #####
##SBATCH -e myjob.e%j ## Errors will be written in this file
#SBATCH -o myjob%A_%a.out ## Separate output files will be created for each array. %A will be replaced by jobid and %a will be replaced by array index
#SBATCH --array=1-30
# Submit array of jobs numbered 1 to 30
########### Perform some simple commands ########################
set -x
########### Below code is used to create 30 script files needed to submit the array of jobs ###############
for i in {1..30}; do cp sleep_test.sh 'sleep_test'$i'.sh';done
########### Run your executable ###############
sh sleep_test$SLURM_ARRAY_TASK_ID.sh
In the above example, we have created 30 sleep_test$index.sh executable files whose names differ by index. We can accomplish this task either by submitting 30 individual jobs or using an efficient and simple method of slurm arrays which takes these files in the form of an array sleep_test[1-30].sh. The variable SLURM_ARRAY_TASK_ID array is set to array index value [1-30], which is defined in the Slurm script above using #SBATCH directive
#SBATCH --array=1-30
The simultaneous number of jobs using a job array can also be limited by using a %n flag along with –array flag. For example: to run only 5 jobs at a time in the Slurm array, users can include the SLURM directive.
#SBATCH --array=1-30%5
In order to create a separate output file for each of the submit jobs using Slurm arrays, use %A and %a, which represents the jobid and job array index as shown in the above example.
Exclusive Access to Nodes
As explained in the Scheduling policy, the jobs submitted by the same user can share the nodes. However, users can request the whole node(s) to run their jobs without sharing them with other jobs. To do that use the below command:
Interactive batch mode:
$ salloc -A projectaccount --nodes=1 --ntasks=1 --partition=campus --time=01:00:00 --exclusive
Non-Interactive batch mode:
Add the below line in your job script
#SBATCH --exclusive
[Monitoring the Jobs Status
Users can regularly check the status of their jobs by using the squeue command.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1202 campus Job3 username PD 0:00 2 (Resources)
1201 campus Job1 username R 0:05 2 node[001-002]
1200 campus Job2 username R 0:10 2 node[004-005]
The description of each of the columns of the output from squeue command is given below
Name of Column | Description |
JOBID | The unique identifier of each job |
PARTITION | The partition/queue from which the resources are to be allocated to the job |
NAME | The name of the job specified in the Slurm script using #SBATCH -J option. If the -J option is not used, Slurm will use the name of the batch script. |
USER | The login name of the user submitting the job |
ST | Status of the job. Slurm scheduler uses short notation to give the status of the job. The meaning of these short notations is given in the table below. |
TIME | The maximum wall time requested by the user for a job |
NODES | The requested number of nodes on which the job is running along with the node names if resources are already allocated |
When a user submits a job, it passes through various states. The values of these states for a job is given by squeue command under the column ST. The possible values of the job under ST column are given below:
Status Value | Meaning | Description |
CG | Completing | Job is about to complete. |
PD | Pending | Job is waiting for the resources to be allocated |
R | Running | Job is running on the allocated resources |
S | Suspended | Job was allocated resources but the execution got suspended due to some problem and CPUs are released for other jobs |
Altering Batch Jobs
The users are allowed to change the attributes of their jobs until the job starts running. In this section, we will describe how to alter your batch jobs with examples.
Remove a Job from Queue
Users can remove the jobs in any state which are submitted by them using the command scancel.
To remove a job with a JOB ID 1234, use the command:
scancel 1234
Modifying the Job Details
Users can make use of the Slurm command scontrol which is used to alter a variety of Slurm parameters. Although most of the commands using scontrol can only be executed by the System Adminstrator. However, users are granted some permissions to use scontrol for its use on the jobs submitted by them provided the jobs are not in the running mode
Release/Hold a job
scontrol release/hold jobid
Modify the name of the job
scontrol update JobID=jobid JobName=any_new_name
Modify the total number of tasks
scontrol update JobID=jobid NumTasks=Total_tasks
Modify the number of CPUs per node
scontrol update JobID=jobid MinCPUsNode=CPUs
Modify the Wall time of the job
scontrol update JobID=jobid TimeLimit=day-hh:mm:ss
Known Issues
When a user successfully authenticates the sipmount or qsub commands, it will report that the user has successfully logged in, even though the user is already logged in. This is due to the authentication mechanism using the same security controls as are used for authenticating a login attempt.