Responsible Node Sharing
What is Responsible Node Sharing?
Responsible node sharing is a capability for the condos within the Open Research resources of the Infrastructure for Scientific Applications and Advanced Computing (ISAAC). Responsible node sharing allows for a user to submit a batch job with resource specifications that would be selected by the job scheduler to run on either the institutional or private condos and for that job to share a single compute node with other jobs. By running more than one job from different users and even different workgroups on the same compute node, the throughput of the facility and the efficiency of the use of resources are both significantly increased. In responsible node sharing all of the jobs that run on a single compute node would be managed by the job scheduler to allow only those jobs whose processor core and memory resources fit onto a single node. As the facility is made up of different node types with different numbers of processor cores and different amounts of memory, the scheduler has to manage all of the job requests and provision jobs with the resources needed for those diverse requests. Responsible node sharing is the default behavior for the institutional and private condos. Private condo owners can request at the time their equipment is purchased to opt out of responsible node sharing in their Service Level Agreement (SLA). All private condos have an SLA.
When Will Responsible Node Sharing Go into Effect?
The scheduling of the institutional condos was changed on April 15, 2020 to implement responsible node sharing. The scheduling of the private condos was changed in January 2024 to implement responsible node sharing. Private condo servers will participate in the responsible node sharing with jobs run by others outside of the private condo owners project membership (workgroup) will be limited to three hours. So the maximum amount of time a private condo job from a workgroup member will have to wait for their job(s) to run on their private condo will be three hours (if no other project members are running at that time). The three hour time limit will only be for private condos and will not be applied to the institutional condos.
What do Users Need to Do?
For an effective responsible node sharing capability, users need to accurately specify their job resources. At the time of this writing, users must specify the amount of nodes their job requires and the wallclock time for the job. Users should also specify the amount of cores their job requires with the correction SLURM option. If this option is not specified, the scheduler will allocate a single core to the job. Memory will be allocated to jobs based on the number of cores requested. If your job requires additional memory, users would need to specify more cores.
What If My Job Should Not Run on a Shared Compute Node?
Users always have the option to specify that a compute node will be exclusively used by their job. Users can specify the SLURM “–exclusive” option on their batch jobs to setup this job requirement for exclusive node allocation/access.
What if I Have More Questions?
Questions about responsible node sharing can be sent to the OIT Help Desk (see https://help.utk.edu) or click on our “Submit HPSC Service Request” in the menu to the left on this page.