File Systems on ISAAC ORI
Overview
ISAAC ORI has a Lustre file system with a capacity of 1.0 petabytes and will also have /lustre/haven mounted until the Haven file system is retired. ISAAC ORI has a NFS file system for home directories that has a capacity of 20 terabytes.
NOTE: Lustre ORI has a new 1.0 PB Lustre file system. Quotas on Lustre ORI (/lustre/ori) will generally be lower than Lustre Haven (/lustre/haven).
In High Performance Computing (HPC), a Lustre scratch space refers to a storage area that is used to hold intermediate data during the execution of serial and parallel computing jobs. The scratch space is a portion of the Lustre file system (i.e. a directory path on Lustre) that is reserved for storing data that is only needed during the execution of a job. When a job is submitted to an HPC system, it is assigned or has access to a certain set of resources, including CPU cores, memory, and disk storage space. The Lustre scratch space is used to provide storage for the intermediate data that is generated during the execution of the job. This data may include input data, intermediate results, and output data. The scratch space is typically used for data that is too large to fit into the memory of the compute nodes, for data that needs to be shared between multiple compute nodes, and job results data that may be used as input data for future computational work. The data is stored in the scratch space during the execution of the job and is deleted once the job has been completed, after checking the results of the job, and checking that the data is not needed for future computational work.
The Lustre scratch space is optimized for high performance and high concurrency, allowing multiple compute nodes to access the file in Lustre simultaneously by many users or by the same user in parallel jobs. This makes Lustre well-suited for use in HPC environments, where large data sets or collections of data sets need to be processed quickly and efficiently. Overall, the Lustre scratch space is an essential component of HPC systems, providing a storage area for intermediate data that is generated during the execution of many serial or many large parallel computing jobs. Lustre scratch space is separate and distinct from Lustre project space or data set long-term archival storage space.
Two filesystems are available to ISAAC-ORI users for storing user files: the Network File System (NFS) and Lustre. NFS contains and is used for home directories and Lustre is used for project and scratch directories. Table 1.1 summarizes the available filesystems.
File System Purge Policies and Default Quotas | |||
---|---|---|---|
File System/Purpose | Path | Quota | Purged |
NFS Home Directory Purpose: Environment files Capacity 20 TB | /nfs/home/<username> | 50GB, 1 million files | Not Purged |
Lustre Scratch Directory Purpose: Scratch file space Lustre capacity 1.0 PB | /lustre/ori/scratch/<username> | 10 TB, 5 million files | Purging may begin at a later date |
Lustre Project Directory Purpose: Project Files Lustre capacity 1.0 PB | /lustre/ori/proj/<project> | 1 TB, no file limit | Not Purged |
Please note that while both NFS and Lustre are generally reliable filesystems, errors and corruptions can still occur. All users are responsible for backing up their own data. To learn about data transfer to/from ISAAC-ORI, please review the Data Transfer document. For more information on the Lustre file system, please refer to the Lustre User Guide.
Home directories have a relatively small quota and are intended for user environment files. Lustre scratch directories have a much higher quota, are not backed up, and are intended for files for running jobs and recently run jobs. Scratch works well for large amounts of space while a computation is running or while you are investigating the results of a computational run. Lustre project directories will also have a quota, are used for sharing files among a research group as a project has a corresponding Linux group (see the group name in the portal for the project), and for more permanent file collections.
All the file systems and the researcher directories are intended to be managed by the researcher. From time to time if OIT HPSC staff see quotas are overrun or if Lustre begins to fill up in space then staff may ask researchers to manage their files and remove any unneeded files (usually intermediate results files or copies of files that already exist somewhere else).
Details – Home, Scratch and Project Directories
Home Directories
On the ISAAC-ORI cluster, Network File System (NFS) is used for home directories. Home directories on NFS are not currently backed up. Each new account on ISAAC receives a home directory on NFS. This is each account’s storage location for a small number of files. Home directory is where you can store job scripts, virtual environments, and other types of files and data. In a Linux environment, you can refer to your home directory with the environment variable $HOME or the tilde (~) character.
By default, your home directory is limited to 50GB of storage space. It is not intended to store large amounts of a project or job-related data. For job-related data, please use your scratch directory. For project data that you do not want to be purged, request and use project space.
Scratch Directories
Lustre scratch directories have no quota, are not backed up, and are intended for files for currently or recently running jobs. Scratch works well for large amounts of space while computation is running, while you are investigating the results of a computational run, or if you have a project directory but need storage for a short period that will not fit in project directory space.
Scratch directories in ISAAC-ORI are available for all users on the Lustre filesystem. Approximately 2.9 petabytes of Lustre storage space is available on /lustre/haven which is shared with scratch directories and project directories.
Important: Lustre scratch directories are NOT backed up.
Important Purging Notice: Lustre scratch space can be purged monthly on approximately the 3rd Monday of each month. Files in Lustre scratch directories (only /lustre/haven/scratch/{username} directories) are deleted by the purging process if they have not been accessed or modified within 180 days. In general, users have many intermediate files that are no longer needed once a job completes and results are returned. Many times these files and other orphaned and unneeded files are not deleted by users and they accumulate in scratch directories and can fill the file system which is detrimental to all users. Purging email notices are sent to all active users prior to purging of scratch space which can take place on the 3rd Monday of each month. This email notice about an upcoming purge will explain the purging process and how to request a purge exemption or the process to request a project space (project directories are exempt from purging). To request a purge exemption submit an HPSC service request with “purge exemption request” in the subject.
To transfer data out of your scratch space see the ISAAC-ORI Data Transfer documentation.
Each user has access to a scratch directory in Lustre which is located at /lustre/haven/scratch/{username};. For convenience, use the $SCRATCHDIR environment variable to refer to your Lustre scratch directory.
If you wish to determine which files are eligible to be purged from Lustre space, execute the lfs find $SCRATCHDIR -mtime +180 -type f
command. Files that will be purged from Lustre space are those that are not modified or accessed for 180 days. If you wish to view your total usage of Lustre space, execute the lfs quota -u <user> /lustre/haven
command.
Any attempts to circumvent purging, such as using the touch command on all files in a users scratch directory, will be considered a violation of the ISAAC-ORI acceptable use policy. Instead of taking the time to circumvent purging, why not request a project with corresponding project space. As we are all Tennessee Volunteers, our research community will be improved with positive user actions and behaviors, such as cleaning up unneeded files or requesting a project, instead of circumventing ISAAC-ORI file purging policy. This will allow staff to more easily manage the file system in order to ensure its continued availability for all users.
Project Directories
In many cases a researcher or a research group will want an ISAAC project created so there is a Linux project group and a project directory on the Lustre file system where the research group can share files and coordinate work. If a project is requested, a project directory on Lustre is created and this is in addition to a researchers home directory and Lustre scratch directory (See ISAAC-ORI File Systems for more info). One can go to the User Portal and once logged in one can make a request by selecting the button under the Projects heading that says “click here to request a new project …” Project directories get a default quota of 1 terabyte. Additional space can be requested via a service request. Requests over 10 terabytes may be required by the HPSC Director to be reviewed by the Faculty Advisory Board before being granted.
Lustre project directories do have a quota, are used for sharing files among a research group as a project has a corresponding Linux group (see the group name in the portal for the project), and for more permanent file collections.
File System Quotas
The project directories have a group/project quota for the entire directory tree as well as there are quotas for individual users. They are cumulative for users and projects. Suppose the project has a 20 TB quota, and you use 10TBs, and another project member uses 10TBs and puts those files under the project directory tree. In that case, the next set of files put into that directory by either user will exceed the project quota.
If you want to see your home directory quota use the quota -s command. The amount of space used will be in the space column, the soft quota value is in the quota column, and the quota hard limit is in the limit column.
When you exceed the quota, you will start a grace period that gives you time to reduce your storage space usage. If you do not reduce your storage space used during this period, the soft limit defined by the “quota” field will be enforced.
$ quota -s Disk quotas for user netid (uid 00000): Filesystem space quota limit grace files quota limit grace nfs.cn.ori.isaac.utk.edu:/data/nfs/home 937M 46080M 51200M 9161 900k 1000k
Figure 2.1 – Output of quota -s
To see how much space you have consumed in the lustre files system overall (anywhere on /lustre) use the lfs quota command. See the example below. The output shows the storage space used in the used column and the number of files in the files column. The soft quota is in the quota column and the hard limit is in the limit column.
$ lfs quota -h -u netid /lustre/haven Disk quotas for usr netid (uid 00000): Filesystem used quota limit grace files quota limit grace /lustre/haven 851.2M 9T 10T - 20306 4500000 5000000 -
Figure 2.3 – output of lfs quota