The transition to SLURM is now nearly complete. The use of Torque and its related commands, such as, qsub will no longer work. Users on all systems now must use SLURM. That is on ISAAC SIP on the secure enclave side, and for both open enclave clusters ISAAC Legacy (formerly known as ACF) and ISAAC Next Generation. The only item remaining is to remove all use of Torque and qsub in Open OnDemand. If you need to use Open OnDemand use it on ISAAC Next Generation. We are going to end of life the ISAAC Legacy Open OnDemand and if users want to use Open OnDemand they will have to use it on ISAAC Next Generation. ISAAC Legacy OOD will be turned off on July 30th when we have the power maintenance outage.
Also, the transition to Globus Connect Server version 5.4 is now complete. We no longer use the Globus that uses certificates and we rely on Globus and authentication with UT’s Central Authentication Service.
The below is the historical information regarding the SLURM transition plan…
In an effort to save operating costs and use the latest technology for our clusters, the ISAAC Legacy cluster will transition from Torque/Moab to SLURM by June 30, 2022. Also, along with the transition to SLURM all of the ISAAC Legacy data transfer nodes (DTNs) will be converted to Globus Connect Server version 5 (GCSv5) which no longer uses InCommon Certificates for authenticaiton and just uses normal UTK Central Authentication Service (CAS). Below describes the plan for transitioning to SLURM and GCSv5 on the ISAAC Legacy cluster. If you have questions please submit an HPSC Service Request.
Training for SLURM already exists with the ISAAC-NG cluster since that cluster was built from the ground up with SLURM. You can use this training and test on the ISAAC-NG cluster to test SLURM batch scripts on ISAAC-NG until the ISAAC Legacy specific training is available.
During the transition dates (Jun 29 and Jun 30) there will be the potential for interruptions of service on ISAAC Legacy as the transition will require pausing scheduling on the SLURM managed nodes in order to incorporate the Torque managed nodes. This will also incorporate several partition name changes. Jobs that do not finish by this date may be canceled by administrators and need to be resubmitted after the maintenance period.
|Jan 31||Purchase last Torque/Moab licenses valid through June 30, 2022 (completed)|
|Jan 28||Creation of SLURM transition plan 2022 website document (this website – completed)|
|Feb 1||Notification to ISAAC users of the SLURM transition and plan (completed)|
|February||Retire nics#datamover4 endpoint and transition to GCSv5 “UTK ISAAC Legacy” (completed)|
|March/April||Installation and configuration of servers to run SLURM services (completed)|
|May 19||Transition AMD, bigmem and GPU nodes to SLURM, Update ISAAC Legacy Open OnDemand|
|May||Contact faculty with private condos and determine transition dates for private condos|
|May||Begin transition of private condos to SLURM. Transition must be performed by June 30|
|May||Begin offering Intro to SLURM and Transition to SLURM trainings (recorded)|
|Jun 29-30||Transition University Condos (UTK & UTHSC) to SLURM|
Transition all ISAAC Legacy Globus endpoints to GCSv5 and use UTK ISAAC Legacy endpoint name
Moab goes into maintenance mode at 12:00 on June 30
|Jun 30 at noon||Transition to SLURM for all resources was completed on this date and time. Torque jobs will no longer be accepted after June 30, 2022 at noon.|
The ISAAC Next Generation (ISAAC-NG) cluster went operational January 1, 2022 and started with SLURM. The Secure Enclave HPC cluster was transitioned to SLURM in 2021. A Torque to SLURM command mappings (rosetta stone) was created and is part of the ISAAC Next Gen documentation on the HPSC website. The direct link is here.