Skip to content Skip to main navigation Report an accessibility issue
High Performance & Scientific Computing

Data Transfer on ISAAC-NG



INTRODUCTION

The ISAAC-NG computing cluster provides several ways for users to transfer files to/from various file system locations on the ISAAC-NG computing cluster, such as: NFS home directories, NFS project directories, Lustre project directories, and Lustre scratch directories. DTNs (Data Transfer Nodes) furnish this capability.

At the time of this writing, there is one Data Transfer Node (DTN) available to ISAAC-NG users to utilize for the purpose of data transfer.

Data Transfer NodeHostname (for SCP)Globus Endpoint
dtn1dtn1.isaac.utk.eduUTK ISAAC-NG DTN1
dtn2dtn2.isaac.utk.eduUTK ISAAC-NG DTN2
UTK Google Drive
Table 1.1: ISAAC-NG Data Transfer Node (DTN)

This Data Transfer Node is setup for NetID authentication, Duo multi-factor authentication (MFA), and authentication through an InCommon Credential. Using these means, ISAAC-NG users can login to this node and perform data transfer functions (e.g., moving project data to or from ISAAC-NG directories, moving data to a project’s backup storage medium, moving project data from a project’s data storage to an ISAAC-NG directory for data processing, etc.).

To connect to this DTN, users may use ssh (Secure Shell) in a command-line terminal. More information on how users can use the ssh protocol to access ISAAC-NG computational resources can be found on the Access and Login ISAAC-NG webpage. When using ssh, users may simply replace the hostname of an ISAAC-NG login node (login.isaac.utk.edu) with the hostname of the DTN to which they wish to connect (dtn1.isaac.utk.edu or dtn2.isaac.utk.edu), then authenticate with their UT NetID, their password, and a Duo multi-factor authentication code (MFA).

ISAAC-NG supports several data transfer protocols including SCP, SFTP, and Globus. SCP and SFTP are both ssh utilities available for transferring files but tend to perform slower than Globus. At the time of this writing, Globus offers the fastest data transfers on ISAAC-NG. Still, SCP and SFTP are useful methods for users who are performing small transfers. Finally, the ISAAC-NG DTN remotely mounts the haven lustre file system in addition to its native ISAAC NG Lustre filesystem, allowing transfers between ISAAC Legacy and ISAAC NG to be performed in the local filesystem.

LNET-BASED TRANSFERS (FROM LEGACY HAVEN FILESYSTEM ONLY)

As shown in figure 2.1, the ISAAC NG DTN mounts both its local lustre filesystem via an ordinary infiniband-based mount, as well as the legacy Haven lustre filesystem via LNET, which allows lustre traffic to be routed between sites via the internet. Currently, the DTN is the only node configured in this way. The remote mount is implemented by routing the lustre traffic from the DTN through LNET routers. There are two routers at each site (KPB and ORNL), with each set being configured as a failover pair. All four LNET routers are equipped with a 40Gb/s dedicated connection to the LNET network, allowing cross-site transfer speeds more closely comparable to native infiniband speeds (100Gb/s for EDR). This is a significant improvement over traditional data transfer via the methods described in subsequent sections (which are limited to a theoretical maximum 10Gb/s, with actual transfer speeds being closer to 5.5 Gb/s).

FIGURE 2.1 – MOUNTS ON NG DTN

Note that haven is mounted readonly; this allows files to be transferred from Haven to ISAAC NG using ordinary local file operations (but not the other way around). However, you should never use standard filesystem tools such as rsync/cp/mv to do this, as performance will be worse than any other means of data transfer. The preferred tool for LNET data transfers is fpsync, which provides a parallel wrapper around rsync. An example fpsync transfer is depicted in figure 2.2; the total time to transfer 1 TB of data in this example was just under 6 minutes, for an effective transfer rate of 24 Gb/s. In general, the correct usage of fpsync is:

fpsync -t /tmp/fpsync-<netid> -v -n <forks> \
          /lustre/haven/<haven-dir> /lustre/isaac/<isaac-ng-dir>

where -t /tmp/fpsync-<netid> tells fpsync to use a temp directory specific to your NetID and -n <forks> specifies the number of parallel rsync processes you want to use. Although fpsync uses rsync on the backend, performance is significantly better than an unmodified rsync, as shown in figure 2.3, which takes over 43 minutes to complete the same transfer at an effective bandwidth of only 3 Gb/s. This is due to the fact that rsync copies data serially and is hence more adversely affected by cross-site latency and per-TCP-stream bandwidth limits than a parallel copy. Note that cp and mv can be expected to have similarly poor performance to rsync.

We have empirically determined that 8 or 16 forks will provide the best performance for a copy without adversely affecting other users of the DTN; this is consistent with expectations since each fork uses 2 CPU cores and there are 48 cores on the DTN. Using more than 16 forks does not significantly improve performance while significantly harming performance for other DTN users. Continuing to increase the number of forks past 24 typically results in slower transfer speeds even when no other users are contending for resources on the DTN.

FIGURE 2.2 – SAMPLE FPSYNC TRANSFER
FIGURE 2.3 – SAMPLE RSYNC TRANSFER

Note: fpsync has an extensive set of options for modifying how the parallel transfer is performed, including changing the backend from rsync to cpio or tar, overriding the manner in which the file system tree is partitioned prior to transfer, etc. You are welcome to experiment with these options, but due to the number of possible variables, such use will not typically be supported by HPSC staff.

GLOBUS WEB-BASED TRANSFERS

The Globus web interface allows you to conveniently perform data transfers to and from ISAAC-NG resources. At the time of this writing, Globus is the fastest and most efficient data transfer method available on the ISAAC-NG computing cluster, with the exception of LNET transfers described in the previous section, which can only be used for transfers from Haven.

Users should also review the official Globus documentation for more information on how to use the Globus tool.

NOTE: The ISAAC-NG data transfer node is running Globus Connect Server version 5 (GCSv5). The InCommon Credential system is now not required for using the Globus Web interface.

stuff

Using the Globus Web Interface

To access the Globus interface in your browser, navigate to the Globus website (app.globus.org). Once on the official Globus website, users should login using the existing organizational login option. Verify that the University of Tennessee is selected, then select “Continue.” Authenticate with your UT NetID, password, and Duo MFA code.

After authenticating, users will then see the interface depicted in Figure 3.1.

FIGURE 3.1 – INITIAL GLOBUS INTERFACE

Before users can initiate file transfers between a user’s local machine and ISAAC-NG, users must configure endpoints. One endpoint will reference the user’s local system while the other endpoint will reference one of the ISAAC NG DTNs (dtn1.isaac.utk.edu or dtn2.isaac.utk.edu). Further instructions regarding these endpoints will be provided below.

To configure the necessary endpoints in the Globus interface, select the “Endpoints” tab on the left-hand side of the page. You will then see a page similar to Figure 3.2. At the top-right of the page, marked with red box, select “Create new endpoint.” On the endpoint type selection page, choose “Globus Connect Personal.”

FIGURE 3.2 – GLOBUS ENDPOINT MENU

Click to Download Globus Connect Personal for Windows if you are using Windows machine. For other OS types, click on Show me other supported operating systems.

Figure 3.3: Globus Connect personal download

Open the installer and install the application. After the installation finishes, you should see a Globus logo (bottom right corner for Windows and top right menu bar for Mac). Right click on the logo, you will see the option as shown in Figure 3.4. Click on Transfer Files.

Figure 3.4: Globus logo and Globus options

If the endpoint for computer is not already setup, then users will see the window as shown in Figure 3.5. Enter the collection Name. This name will be user’s local machine name. The name users choose is unimportant; however, it should be something memorable (e.g., JaneSmithLocalEndpoint, etc.). Exit the setup upon successful completion.

Figure 3.5: Globus endpoint creation menu

Once a user has configured their local machine as a Globus endpoint, the user should return to the “File Manager” tab on the left side of the page. Users should make sure to select the double panels option in the top-right of the page (Figure 3.6 highlights this option). This will display the user’s local machine’s filesystem in addition to the ISAAC-NG DTN’s filesystem.

Once both panels are displayed and users can see their local machine’s filesystem and the ISAAC-NG DTN’s filesystem, the user should click on “Collection” in the left panel. The users should then type the name of the desired endpoint into the search bar, or find it under “My Collections.”

After the desired endpoint has been selected, the user should return to the File Manager page. In the right panel, click on “Collection,” and then search for the Data Transfer Node (DTN) associated with ISAAC-NG, which is named:

  • UTK ISAAC-NG DTN1
  • UTK ISAAC-NG DTN2 (cannot be used for transfers to or from endpoints for the Secure Enclave)
FIGURE 3.6 – GLOBUS FILE TRANSFER INTERFACE

Once both endpoints are configured (for the local machine endpoint and the ISAAC-NG DTN endpoint), a user can transfer data between the two endpoints. Figure 3.6 shows what the Globus interface should look like when both endpoints are selected. Users can select individual files and directories for to transfers. When users are finished selecting the data they wish to transfer, the user should press the “Start” button below the endpoint from which the user will transfer data.

Additionally, the user can navigate throughout the filesystem hierarchy in either endpoint using the Globus interface. Other options are available for user data transfers to or from ISAAC-NG, but these other options are usually unnecessary to accomplish most transfers.

Using Globus for UTK Google Drive

It is possible to transfer data between your UTK Google Drive and most other Globus endpoints using the Globus file transfer interface by selecting “UTK Google Drive” as one of the end points. This allows you to transfer files between your UTK Google Drive and endpoints for ISAAC-NG or endpoints for ISAAC Legacy, but not endpoints for the Secure Enclave. The Globus endpoint for your UTK Google Drive resides on dtn2.isaac.utk.edu and automatically tunnels data through DTN2 from Google to another Globus endpoint. The UTK Google Drive endpoint uses an application credential to access your Google Drive, so you will need to login to both your Globus account and your UTK Google Drive account and click through several consent screens to give the two accounts permission to work together the first time you connect. Your Google Drive contents will be located in the “My Drive” folder of the endpoint.

Note that by default you can only use the UTK Google Drive endpoint to connect to your own UTK Google Drive. Without additional configuration, you cannot use the endpoint to connect to a Google Drive associated with any Google account not sponsored by UTK, and you cannot use it to connect to the Google Drive for any other user’s UTK account. Performing such transfers will require an additional step in which the owner of the other account uses the Google Drive interface to share the files with your UTK Google account. Once you have done this, you should be able to find the files in the “Shared With Me” folder of the endpoint.

Using Globus Command Line Interface (CLI)

Globus CLI is a tool developed by Globus which provides an interface to interact with different Globus services using the command prompt or terminal. Using CLI, users can orchestrate data transfer between two endpoints, sync directories, or manipulate directory structure in their account on ISAAC cluster without using the Globus web interface. One of the interesting points about Globus CLI is that users do not need to have CLI installed on each of the computers or clusters to use its features. Click here to learn more about working with Globus CLI to transfer data.

USING FILEZILLA TO TRANSFER FILES

FileZilla will work with file transfers to ISAAC-NG. Please only use the ISAAC-NG DTNs, dtn1.isaac.utk.edu or dtn2.isaac.utk.edu.

To use the FileZilla client with your NetID, password, and Duo MFA, follow these steps.

  1. Open the FileZilla client after downloading the client from the official filezilla-project.org website.
  2. Select File, then Site Manager.

    Figure 5.1 – FileZilla’s Site Manager Option
  3. Select “New Site,” then provide the required information when prompted. For the host, select one of the ISAAC-NG DTNs, dtn1.isaac.utk.edu or dtn2.isaac.utk.edu. For protocol, select SFTP – SSH File Transfer Protocol. For Logon Type, select Interactive. For User, type your UT NetID. Finally, rename the entry under sites from “New Site” to something more memorable, such as the name of the ISAAC-NG DTN. Refer to Figures 5.2 and 5.3 to identify where to find these options.

    (Above) Figure 5.2 – New Site in FileZilla (Below) Figure 5.3 – FileZilla Site Options
  4. Select Transfer Settings, then check the box for Limit the number of simultaneous connections. Make sure the value beneath this checkbox is 1.
  5. Select “Connect” in the Site Manager window.
  6. When prompted, enter your password.
    Figure 5.4 – FileZilla Password Prompt
  7. When prompted, type a “1” to send a Duo Push to your mobile device, then authenticate with Duo MFA. Upon successful authentication, you will be logged in to the ISAAC-NG DTN through FileZilla.

    Figure 5.5 – FileZilla Duo Prompt

Stuff

USING WINSCP TO TRANSFER FILES

WinSCP can perform file transfers to and from ISAAC-NG. Please use one of the ISAAC-NG DTNs: dtn1.isaac.utk.edu or dtn2.isaac.utk.edu.

To use the WinSCP client with your NetID, your password, and a Duo MFA code, please follow these steps.

  1. After downloading WinSCP from the official website (winscp.net), open WinSCP, then click on “New Site.”
  2. Provide the hostname of the ISAAC-NG DTN, dtn1.isaac.utk.edu or dtn2.isaac.utk.edu, for “Host name,” your UT NetID for “User name,” and your password. Leave the port number as 22.

    Figure 6.1 – WinSCP New Site Creation
  3. When warned about an unknown server, select “Yes.

    Figure 6.2 – Initial WinSCP Key Warning
  4. The authentication banner will appear. Select “Continue.”

    Figure 6.3– WinSCP Authentication Banner
  5. When prompted, type “1” to receive a Duo Push on your mobile device. Authenticate with Duo. You will then be logged in.

    Figure 6.4 – Duo Prompt in WinSCP
  6. Once you authenticate, you will get the WinSCP application screen. On the left side of the screen, you see your local machine. On the right side of the screen, you see the remote system into which you are logged.

Preparing Data for Tansfer

Before users initiate any data transfers from the Open Enclave to another storage resource, they should consider preparing the data they wish to transfer by archiving and compressing it.

When a user archives data, several files and directories can be added to the same location.

When a user compresses data, they can reduce the data’s total size.

Both methods reduce the total amount of data that must be sent across the network and makes it easier for users to organize the data they wish to transfer.

At the time of this writing, the tar and zip utilities are the best methods for data archiving and compression for ISAAC-NG users who use machines running Linux, MacOS, and Windows.

Stuff

Using the tar Utility

The tar (tape archiver) utility uses simple command syntax and allows large amounts of data to be aggregated into the same archive. Linux, MacOS, and updated Windows 10 systems can use tar. Older Windows systems will be limited to the zip utility.

To create a tar archive, execute tar czvf <archive-name> <dir-to-archive>. Replace the <archive-name> argument with the name of the new archive you wish to create. Be sure to follow the name with the .tar.gz extension, as in “my_archive.tar.gz.” Replace the <dir-to-archive> argument with the name of the directory into which you wish to place the new archive. If the directory you intend to archive is not within your working directory, YOU MUST specify the relative or absolute path to the archive. By default, tar will recursively place the directory and its contents into the new archive. Figure 7.1 shows the successful creation of a tar archive.

[user@login1 ~]$ tar czvf new_archive.tar.gz Documents
Documents/
Documents/IntroUnix.pdf
Documents/JobSubData.zip
Documents/MATLAB/
Documents/Scripts.zip
Documents/PyLists.py

Figure 7.1 – Creating a tar Archive

After the archive is created, execute ls -l to verify that the archive exists. You can view its contents with the tar tvf <archive-name> command.

You may then transfer the archive using one of the data transfer methods described in this document. In general, Globus is the best method. Please refer to the Globus section to learn how to configure it for your system. On the remote system, execute tar xvf <archive-name> to extract the contents of the archive. The files will be extracted into your ISAAC-NG working directory.

stuff

Using the zip Utility

When working with older Windows systems, the zip utility should be used to archive and compress your data from ISAAC-NG.

To create a zip archive on ISAAC-NG, execute zip -r <archive-name>.zip <dir-to-archive>. 

Be sure that the directory that you wish to archive is in your working directory. Otherwise, you MUST specify the relative or absolute path to the directory you wish to archive.

Replace the <archive-name> argument with the name of the new zip archive. You may or may not choose to include the .zip file extension in the new archive’s name at this time; if you do not include the .zip file extension in the new archive’s name, the zip utility will add it to the new archive’s name automatically. Replace the <dir-to-archive> argument with the directory you wish to place in the zip archive. The -r option ensures that the directory and its contents are archived and compressed. Figure 7.2 shows the successful creation of a zip archive.

[user@login1 ~]$ zip -r Documents Documents
  adding: Documents/ (stored 0%)
  adding: Documents/IntroUnix.pdf (deflated 4%)
  adding: Documents/MATLAB/ (stored 0%)
  adding: Documents/PyLists.py (deflated 61%)

Figure 7.2 – Creating a zip Archive

After the zip archive has been created, execute ls -l in the directory from which you created the archive to ensure the archive exists. The new archive that has been created will appear with the name you gave to the new archive followed by the .zip extension.

With the zip archive created and verified, transfer the archive to your system using one of the data transfer methods described in this document. In most cases, Globus is the most convenient method. Please refer to the Globus section to learn how to configure it for use on your system.

Once you transfer the zip archive to your system, open the File Explorer and navigate to the directory in which you placed the archive. Right-click on the archive and select the “Extract All…” option in the submenu. Figure 7.3 shows where to locate this option. Specify the directory into which the contents should be extracted, then select “Extract.” You may then open the archive and peruse its contents.

This image has an empty alt attribute; its file name is Unzip-Archive-Windows-1.png
Figure 7.3 – Extracting the Contents of a zip Archive in Windows