File Systems on ISAAC Legacy
Overview
In High Performance Computing (HPC), a Lustre scratch space refers to a storage area that is used to hold intermediate data during the execution of serial and parallel computing jobs. The scratch space is a portion of the Lustre file system (i.e. a directory path on Lustre) that is reserved for storing data that is only needed during the execution of a job. When a job is submitted to an HPC system, it is assigned or has access to a certain set of resources, including CPU cores, memory, and disk storage space. The Lustre scratch space is used to provide storage for the intermediate data that is generated during the execution of the job. This data may include input data, intermediate results, and output data. The scratch space is typically used for data that is too large to fit into the memory of the compute nodes, for data that needs to be shared between multiple compute nodes, and job results data that may be used as input data for future computational work. The data is stored in the scratch space during the execution of the job and is deleted once the job has been completed, after checking the results of the job, and checking that the data is not needed for future computational work.
The Lustre scratch space is optimized for high performance and high concurrency, allowing multiple compute nodes to access the file in Lustre simultaneously by many users or by the same user in parallel jobs. This makes Lustre well-suited for use in HPC environments, where large data sets or collections of data sets need to be processed quickly and efficiently. Overall, the Lustre scratch space is an essential component of HPC systems, providing a storage area for intermediate data that is generated during the execution of many serial or many large parallel computing jobs. Lustre scratch space is separate and distinct from Lustre project space or data set long-term archival storage space.
Two filesystems are available to Open Enclave users for storing user files: the Network File System (NFS) and Lustre. NFS contains and is used for home directories and Lustre is used for project and scratch directories. Table 1.1 summarizes the available filesystems.
File System/Purpose | Path | Quota | Purged |
---|---|---|---|
NFS Home Directory Purpose: Environment files | /nics/[a,b,c,d]/home/<username> | 10GB | Not Purged |
Lustre Scratch Directory Purpose: Scratch file space | /lustre/haven/user/<username> | No Quota | Purged |
Lustre Project Directory Purpose: Project Files | /lustre/haven/proj/<project> | 1 TB default more by request | Not Purged |
Please note that while both NFS and Lustre are generally reliable filesystems, errors and corruptions can still occur. It is each user’s responsibility to back up your data. To learn about data transfer to/from the Open Enclave, please review the Data Transfer document. For more information on the Lustre file system, please refer to the Lustre User Guide.
Home directories are small with a 10 gigabyte quota and are intended for user environment files. Lustre scratch directories have no quota, are not backed up, and are intended for temporary files for running and recently run jobs. Scratch works well for large amounts of temporary space while computation is running, while you are investigating the results of a computational run, or if have a project directory but you need storage for a short period that will not fit in the project directory space. Lustre project directories do have a quota, are used for sharing files among a research group as a project has a corresponding Linux group (see the group name in the portal for the project) and for more permanent file collections.
All the file systems and the researcher directories are intended to be managed by the researcher. From time to time if OIT HPSC staff see quotas are overrun or if Lustre begins to fill up in space, then staff may ask researchers to manage their files and remove any temporary files and any unneeded files. ISAAC does not have archival storage space so OIT HPSC staff must work together with researchers to address storage issues as they arise.
Details – Home, Scratch and Project Directories
Home Directories
On the ISAAC Open Enclave cluster, Network File System (NFS) is used for home directories. Home directories on NFS are periodically backed up for disaster recovery. Each new account on the Open Enclave receives a home directory on NFS. This is each account’s storage location for a small amount of files. Home directory is where you can store job scripts, virtual environments, and other types of files and data. In a Linux environment, you can refer to your home directory with the environment variable $HOME or with the tilde (~) character.
By default, your home directory is limited to 10GB of storage space. It is not intended to store large amounts of a project or job-related data. For job-related data, please use your scratch directory. For project data, you do not want to be purged requests and use project space.
To determine how much storage space you have consumed in your home directory, execute the quota -s
command. Figure 2.1 shows the possible output of this command. Of interest are the first “space,” “quota,” and “limit” fields. The “space” field shows how much storage space is currently in use. The “quota” field displays the soft quota placed on your home directory. The “limit” field defines the hard quota, which is the absolute maximum storage space you can consume. When you exceed the quota, you will start a grace period that gives you time to reduce your storage space usage. If you do not reduce your storage space usage during this period, the soft limit defined by the “quota” field will be enforced.
Disk quotas for user user-x (uid 00001): Filesystem space quota limit grace files quota limit grace nfs.nics.utk.edu:/nfs/b 3144M 10240M 10752M 58432 0 0
Figure 2.1 – Output of quota -s
Scratch Directories
Lustre scratch directories have no quota, is not backed up, and is intended for temporary files for running jobs and recently run jobs. Scratch works well for large amounts of temporary space while a computation is running, while you are investigating the results of a computational run, or if have a project directory but you need storage for a short period that will not fit in project directory space.
Scratch directories in the ISAAC Open Enclave are available for all users on the Lustre filesystem. Approximately 2.7 petabytes of Lustre storage space is available on /lustre/haven which is shared with scratch directories and project directories.
Important: Lustre scratch directories are NOT backed up.
Important Purging Notice: Lustre scratch space can be purged monthly on approximately the 3rd Monday of each month. Files in Lustre scratch directories (only /lustre/haven/user/{username} directories) are deleted by the purging process if they have not been accessed or modified within 180 days. In general, users have many temporary files that are no longer needed once a job completes and results are returned. Many times these files and other orphaned and unneeded files are not deleted by users and they accumulate in scratch directories and can fill the file system which is detrimental to all users. Purging email notices are sent to all active users prior to purging of scratch space which can take place on the 3rd Monday of each month. This email notice about an upcoming purge will explain the purging process and how to request a temporary purge exemption or the process to request a project space (project directories are exempt from purging). To request a temporary purge exemption submit an HPSC service request with “temporary purge exemption request” in the subject.
To transfer data out of your scratch space see the Open Enclave Data Transfer documentation.
Each user has access to a scratch directory in Lustre which is located at /lustre/haven/user/{username};. For convenience, use the $SCRATCHDIR environment variable to refer to your Lustre scratch directory.
If you wish to determine which files are eligible to be purged from Lustre space, execute the lfs find $SCRATCHDIR -mtime +180 -type f
command. Files that will be purged from Lustre space are those that are not modified or accessed for 180 days. If you wish to view your total usage of Lustre space, execute the lfs quota -u <user> /lustre/haven
command.
Any attempts to circumvent purging, such as using the touch command on all files in a users scratch directory, will be considered a violation of the Open Enclave acceptable use policy. Instead of taking the time to circumvent purging, why not request a project with corresponding project space. As we are all Tennessee Volunteers, our research community will be improved with positive user actions and behaviors, such as cleaning up unneeded files or requesting a project, instead of circumventing Open Enclave file purging policy. This will result in less wasted staff support time.
Project Directories
In many cases a researcher or a research group will want an ISAAC project created so there is a Linux project group and a project directory on the Lustre file system where the research group can share files and coordinate work. If a project is requested, a project directory on Lustre is created and this is in addition to a researchers home directory and Lustre scratch directory (See ISAAC Open Enclave File Systems for more info). One can go to the User Portal and once logged in one can make a request by selecting the button under the Projects heading that says “click here to request a new project …” Project directories get a default quota of 1 terabyte. Additional space can be requested via a service request. Requests over 10 terabytes may be required by the HPSC Director to be reviewed by the Faculty Advisory Board before being granted.
Lustre project directories do have a quota, is used for sharing files among a research group as a project has a corresponding Linux group (see the group name in the portal for the project), and for more permanent file collections.
File System Quotas
If you want to see your home directory quota use the quota -s command. The amount of space used will be in the space column, the soft quota value is in the quota column, and the quota hard limit is in the limit column.
$ quota -s Disk quotas for user user1234 (uid 99999): Filesystem space quota limit grace files quota limit grace nfs.nics.utk.edu:/nfs/b 7508M 10240M 10752M 986 0 0 0
To see how much space you have consumed in the lustre files system overall (anywhere on /lustre) use the lfs quota command. See the example below. The output shows the storage space used in the used column and the number of files in the files column. The soft quota is in the quota column and the hard limit is in the limit column.
$ lfs quota -h -u victor /lustre/haven Disk quotas for usr user1234 (uid 99999): Filesystem used quota limit grace files quota limit grace /lustre/haven 6.067T 0k 10T - 147401 50000000 50000000 -