Anaconda User Guide on ISAAC Legacy
Introduction
Anaconda is a popular data science platform used in a wide variety of fields. Conda is Anaconda’s package, dependency, and environment management solution. Originally designed for Python, Conda can handle nearly any programming language with ease. It takes care of the tedious tasks related to packages and their dependencies so that you can focus on your work. In this guide, you will learn how to use Conda for your projects on the cluster.
Loading Anaconda
Anaconda is a loadable module on the cluster. At the time of this writing, five versions of Anaconda are installed on the cluster. Table 2.1 lists these versions. To verify which versions are installed on the cluster, execute module avail anaconda
. If you are unsure which distribution and version to use, review the documentation for the software package(s) you require. They should indicate which distribution and version you should use.
Distribution | Version |
---|---|
Anaconda 2 | 4.3.1 |
Anaconda 2 | 4.4.0 |
Anaconda 3 | 4.3.1 |
Anaconda 3 | 4.4.0 |
Anaconda 3 | 5.1.0 |
To load the appropriate Anaconda module, execute module load <anaconda-version>. Replace <anaconda-version> with the necessary distribution and version number. For example, to load Anaconda 2 version 4.3.1, execute module load anaconda2/4.3.1
. If you do not require a specific version of Anaconda, you should execute the same command but exclude the version number. The default version for that Anaconda distribution will be loaded. To determine which version is the default, review the output of module avail anaconda
. The term “default” will appear in parentheses next to the module.
To configure your shell to use Conda, execute Conda’s setup script. An environment variable defines the path to this script. The name of the environment variable is ANACONDA2_SH
for the Anaconda 2 distribution and ANACONDA3_SH
for the Anaconda 3 distribution. Figure 2.1 shows how to execute this script for the Anaconda 2 distribution. Replace the “2” with “3” if you use the Anaconda 3 distribution.
source $ANACONDA2_SH
Figure 2.1 – Executing the Anaconda Setup Script
To automate the configuration process, add module load <anaconda-version>
to your .bashrc file and then insert the path to the script into the file. You may use the echo command with redirection to insert the script path into your .bashrc file. For instance, to automate the configuration of Anaconda 3 version 5.1.0, review the example in Figure 2.2. Always open your .bashrc file after you modify it to ensure that it was correctly changed. Also ensure that you use two greater than (>) symbols.
echo “module load anaconda3/5.1.0” >> ~/.bashrc echo “source $ANACONDA3_SH” >> ~/.bashrc
Figure 2.2 – Automating Anaconda’s Loading Process
Test the changes to your environment by logging out of the cluster and logging back in. You should be able to activate the base environment with the conda activate
command.
Creating Environments
In Anaconda, an environment is an isolated space where packages and dependencies can be installed without affecting other environments. To create a basic environment with no packages, execute the command shown in Figure 3.1.
conda create -p <env-dir-name>
Figure 3.1 – Creating an Environment in Anaconda
Replace <env-dir-name> with a pathname. For instance, to create an environment in a directory named basic_env, you would execute conda create -p ~/basic_env
. Always verify that the directory you specify is the correct one before you create the environment. Only create Conda environments in directories to which you have access such as your home directory or Lustre scratch space.
Conda permits you to install packages when you create an environment. To do this, specify the packages after the environment’s path. Consider a new environment in the py2_env directory that should have Python 2.7, NumPy, and SciPy installed. In this case, you would execute the command displayed in Figure 3.2. All three packages would be installed as part of the environment’s creation.
conda create -p ~/py2_env numpy scipy python=2.7
Figure 3.2 – Creating an Anaconda Environment with Specific Packages
There is no limit on the amount of packages you may install at the time an environment is created; however, remember that your home directory is limited to 10GB of storage space. If you require more than 10GB of storage for your environment, create the environment in Lustre space with the command presented in Figure 3.3.
conda create -p $SCRATCHDIR/<env-dir-name>
Figure 3.3 – Creating an Anaconda Environment in Lustre Space
Do be mindful of the 30-day purge policy on Lustre. For further considerations relevant to filesystems, please review the File Systems document.
With an environment successfully created, you may activate it with the conda activate
command. Execute conda activate <path-to-env>
to make that environment your active one. Replace <path-to-env> with the path to the environment. Environments in your home directory can be activated by using the tilde (~) character followed by the name of the directory in which the environment is installed. If you created an environment in Lustre space, execute conda activate $SCRATCHDIR/<env-dir-name>
.
Environment Management
It is likely that after you create an environment you will modify it as new packages are required and existing packages are no longer necessary. It may also become necessary to remove an existing environment to make room for a new one. Conda makes all these tasks simple and easy to manage.
To install new packages, you should first search for the package you wish to install. The conda search
command allows you to search for packages in the Anaconda repository. Figure 4.1 shows the syntax for this command.
conda search <package-name>
Figure 4.1 – Searching for Packages in Anaconda
Take the ipython package, which is an interactive Python shell that provides several helpful features. You would first execute conda search ipython
to ensure that the package exists in the Anaconda repositories.
Depending on the package, you may see extensive output because it has multiple versions. In ipython’s case, the Anaconda repository lists 167 different versions of the package. To narrow down the results to only version seven, execute conda search ipython=7
. This output lists 35 ipython versions, which is easier to review.
Once you identify the correct package and version, use the command presented in Figure 4.2. It will download and configure the specified package.
conda install -p <path-to-env> <package-name>
Figure 4.2 – Installing Packages in an Anaconda Environment
Replace the <path-to-env> with the absolute path to the environment. Conda will also determine any dependencies and install them with the package. Note that Conda will prompt you to confirm the installation. If you are absolutely certain that you are installing the correct package and its dependencies, execute conda install -y -p <env-dir-name> <package-name>
. This will override the confirmation prompt during the installation. Be aware that you may specify the version of the package you wish to install. Using ipython as an example, you could install ipython version 7.0.1 with the conda install -p <path-to-env> ipython=7.0.1
command.
To remove old packages, you should first review the list of existing packages in your environment. Figure 4.3 shows the command to use for this task. It outputs the names, versions, builds, and channels of every package in the environment you specify.
conda list -p <path-to-env>
Figure 4.3 – Reviewing the Packages in an Anaconda Environment
If you are in the environment from which you plan to remove packages, execute conda list
without any additional options or arguments.
After you determine which packages to remove, execute the command displayed in Figure 4.4. Conda will prompt you to confirm the removal unless you provide the -y flag to the conda remove command. Only use the -y flag if you are absolutely sure you wish to remove the specified package from your environment.
conda remove -p <path-to-env> <package-name>
Figure 4.4 – Removing a Package from an Anaconda Environment
Updates may need to be applied to the packages within an environment. Generally, this will not be necessary and, in certain circumstances, could be destructive. Carefully consider if an update is essential to your work before attempting one. If you wish to perform an update on the environment, use the command shown in Figure 4.5. Every package in the environment will be updated to the latest version. To update a specific package, use conda update -p <path-to-env> <package-name>
. In either case, you will be prompted to permit the update unless you provide the -y option.
conda update -p <path-to-env> --all
Figure 4.5 – Updating a Package in an Anaconda Environment
If you wish to switch from one environment to another, you first deactivate your existing environment, then activate the one you wish to use. The conda deactivate
command will gracefully shut down your active environment and return to you to a standard terminal prompt. Alternatively, you may use conda activate
with no arguments to return to the base environment, then switch to the environment you wish to use.
To completely remove an environment from your storage space, you should first verify you are removing the correct one. Execute the conda info --envs
command to determine which environments belong to you. Because all Conda environments on the cluster should be created with the -p option, you will only see the pathnames of the environments. Figure 4.6 shows an example of two environments that only display pathnames.
(base) [user-x@acf-login5 ~]$ conda info --envs # conda environments: # /nics/b/home/user-x/cython_env /nics/b/home/user-x/py2_env
Figure 4.6 – Output of conda info –envs
Identify the environment you wish to remove, then execute the command displayed in Figure 4.7. Every package in the environment will be removed, then the environment itself will be deleted. The directory in which the environment resided will also be deleted.
conda remove -p <path-to-env> --all
Figure 4.7 – Deleting an Anaconda Environment
Anaconda Troubleshooting
Issues with Conda generally involve environment- and package-related issues These can usually be remedied without assistance. Always double-check the spelling, capitalization, and punctuation of any names and paths. It is also important to review the documentation for the package(s) you use to understand what it requires. Other issues may require additional investigation; in these situations, use the relevant conda command with the –help option. For example, to review the options available to the conda install command, execute conda install --help
. If you are unable to remedy the issue using the steps outlined in this guide, please contact the OIT Help Desk (see https://help.utk.edu).
Loading the base Environment
In some cases, Conda may load the global base environment when you first log in to the cluster. Unless you switch to another environment with the conda activate command, you will be unable to install, remove, or modify packages. Please note that if you choose not to automatically initialize Conda when you log in, it is unlikely you’ll encounter this situation.
If you encounter this issue, open your ~/.bashrc file. Search for the line that reads “conda activate”. Using your preferred text editor, insert a hashtag (#) before this line. Once you insert this hashtag, save the file. Next time you log in to the cluster, Conda should be available to you without the base environment being automatically loaded.
Moving Environments
If you need to move an environment from one location to another, Conda provides the –clone option for the conda create command. Do not attempt to use the mv or cp commands on a Conda environment. If you do, it will no longer be recognized by Conda. To use the –clone option, specify the environment you wish to copy. For instance, to clone an existing environment named matplot_env in your home directory and place it in Lustre scratch space, use the command displayed in Figure 5.1. Executing this would copy the environment from your home directory and place it in your Lustre scratch space. It would then appear in the output of conda info --envs
.
conda create -p $SCRATCHDIR/matplot_env --clone ~/matplot_env
Figure 5.1 – Cloning an Anaconda Environment
Using pip and Conda
Conda and pip both serve the same purpose: package and dependency management. Therefore, using these two tools together can create conflicts. It is best to use these tools separately and not combine them. Please consult our guide on pip and virtualenv for more information. If you do use Conda and pip together, install as many packages as possible with Conda before using pip. Additionally, if you need to make changes to the environment, it is best practice to recreate it entirely. For more information on using pip and Conda together, consult Anaconda’s official documentation on the relationship between these tools. Please be aware that some of the features mentioned by Anaconda’s official documentation may not be available on the cluster.