AI Tennessee Initiative Resources and Access
The Office of Innovative Technologies (OIT) High Performance & Scientific Computing (HPSC) division is pleased to announce that high performance computing servers with NVIDIA H100 SXM GPUs are available in the ISAAC Next Generation (ISAAC NG) cluster. These resources are provided to the University of Tennessee (UT) community by the University of Tennessee, Knoxville (UTK) Office of Research, Innovation, and Economic Development’s (ORIED) AI Tennessee Initiative. OIT HPSC worked with the AI Tennessee Initiative Director to co-host the May 17, 2023 Artificial Intelligence Research Computing Symposium which had faculty across UTK present on their AI research and to discuss AI resource needs. As a result of the Symposium, OIT HPSC specified a collection of hardware that was reviewed by the Director of the AI Tennessee Initiative and leading UT AI researchers. An order for this hardware was placed with Dell Technologies in July 2023 with funding for the equipment from the AI Tennessee Initiative. The University invested $1,014,357.47 for the equipment described below.
Resources
Our vendor partner, Dell Technologies, delivered the equipment by September 2023 and OIT Operations and OIT HPSC staff prepared the equipment for operation. That equipment as of October 1, 2023 is now available and in operation as part of the ISAAC NG cluster. The equipment is comprised of seven Dell PowerEdge XE8640 rack mounted servers each with four NVIDIA H100 Tensor core GPUs in the SXM5 configuration for extreme performance, fully interconnected with NVIDIA NVLink technology. All of the Dell PowerEdge XE8640 servers are part of the ISAAC Next Generation cluster with Red Hat Linux installed and integrated with access to all the ISAAC NG cluster resources. The hardware is described in the table below.
Node Type OS | CPU Type GPU Type | Nodes | Cores/Node GPUs/Node | RAM/Node NVMe/Node | Interconnect |
---|---|---|---|---|---|
Dell PowerEdge XE8640 Red Hat Enterprise Linux | CPU: Intel Platinum 8463Y+ GPU: NVIDIA H100 with NVLink | 7 | Cores: 64 GPUs: 4xH100 | RAM: 1,024 GBs NVMe: 51.2 TBs | HDR (EDR Lustre) |
The current version of the NVIDIA GPU drivers are version 550.54.15 and Cuda 12.4 (as of Nov 2024).
Access
The AI Tennessee Initiative resources are not generally available to all researchers and are available by request. Access to the AI Tennessee Initiative resources are available to any UTK faculty for AI related research projects by request. HPSC staff will review and approve these requests. Access to the AI Tennessee Initiative resources are available to researchers from other UT campuses after review and approval of the OIT HPSC Director or AI Tennessee Initiative Director. Access to the AI Tennessee Initiative resources is from the ISAAC Next Generation cluster. There is no cost to researchers to access and use the AI Tennessee Initiative resources.
ISAAC NG Node Access
To gain access to the AI Tennessee Initiative resources in the ISAAC NG cluster each faculty, staff, researcher and/or student needs to have registered for an ISAAC account (see Requesting an Account) and an AI Tennessee Initiative research project must be requested and created on the ISAAC NG system. All AI Tennessee Initiative projects must be UT faculty led (identified as the project Principal Investigator). Information needed to create the AI Tennessee Initiative project is the PI information, the project member information, and a short description (25 characters or less) of the AI research to be performed. This will allow OIT HPSC to provide AI Tennessee Initiative resource usage reports to the AI Tennessee Initiative Director. To request an AI Tennessee Initiative research project, place an HPSC service request (Submit HPSC Service Request) with the Principal Investigator’s (faculty’s) name and a short description (25 characters max) of the AI research project. The PI and project members need to have registered for an ISAAC account.
Once an AI Tennessee Initiative project is created, that project will be enabled in SLURM to use the AI Tennessee Initiative resources that are available in the ai-tenn partition in SLURM on ISAAC NG. The SLURM partition name for all the AI Tennessee Initiative resources is ai-tenn and the quality of service (qos) to use to access the AI Tennessee Initiative resources is also named ai-tenn. At least one GPU must be requested in the SLURM batch job or Open OnDemand request to gain access to resources in the ai-tenn partition. An example AI Tennessee Initiative batch job is available on ISAAC NG at /lustre/isaac/examples/jobs/ai-tenn.sh.
For more information on running SLURM batch jobs on ISAAC NG see ISAAC NG Running Jobs webpage. For more information on using Open OnDemand on ISAAC NG see the ISAAC NG Open OnDemand webpage. For workshop and training recordings see the HPSC Workshops and Training webpage.