The EECS department currently hosts one general-purpose research cluster: cuda. The cuda cluster was purchased in 2009, and includes 32 processor cores on 4 compute nodes. It also includes four nVidia Tesla GPGPU cards, and an Infiniband network.
The EECS cuda cluster is part of the standard EECS account infrastructure; once granted access users may log in using their EECS usernames and passwords. To request access to a research cluster, send a message to
The cuda cluster runs CentOS Linux (a binary-compatible clone of RedHat Linux), while most desktop and lab machines run Ubuntu Linux. Since Ubuntu and CentOS are not binary-compatible, any applications users wish to run on the clusters must be compiled on a CentOS system. In general, users should compile their code on the designated compile node, or the head node if no compile node has been assigned.
In addition, software installed on the CentOS clusters will likely be in different locations compared to Ubuntu. To alleviate this problem, users should take advantage of the Modules system, which allows easy configuration of one's shell environment.
To use Modules, you must first load the appropriate initialization files. These are located in /usr/local/modules/init/. ZSH users should run source /usr/local/modules/init/zsh, BASH users source /usr/local/modules/init/bash, etc. Next you must load the modulefiles that support your particular software. A list of all available modulefiles may be obtained by running module avail. Modulefiles may be loaded by running module load MODULENAME. Users who wish to submit jobs should run module load torque, while running module load cuda will allow you to compile and run code that depends on the CUDA libraries.
Access to the cluster compute nodes is controlled through Torque/Maui, an open-source resource manager / job scheduler combo. In order to use Torque on cuda, users must load the torque modulefile as described in the “CentOS environment” section above.
showq can be used to view the current state of the cluster. It displays the number of online compute nodes and CPU cores, as well as the current usage. It also displays any active, queued, or blocked jobs. showq must be run from a terminal on the head node of the cluster being used.
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
9 steadmon Running 16 00:59:30 Fri Sep 11 11:21:31
1 Active Job 16 of 32 Processors Active (50.00%)
2 of 4 Nodes Active (50.00%)
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
10 steadmon Idle 24 1:00:00 Fri Sep 11 11:21:58
1 Idle Job
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 2 Active Jobs: 1 Idle Jobs: 1 Blocked Jobs: 0
showstart JOB_ID shows an estimated start time for a queued job.
cudahead:~> showstart 10
job 10 requires 24 procs for 1:00:00
Earliest start in 00:59:24 on Fri Sep 11 12:21:31
Earliest completion in 1:59:24 on Fri Sep 11 13:21:31
Best Partition: DEFAULT
qsub may be used to schedule new interactive or batch jobs. Interactive jobs start a shell session on a single compute node, while batch jobs run a user-provided script on a single compute node. It is important to remember that you are responsible for starting your code on the additional assigned compute nodes; this may be done with ssh, MPI programs such as mpirun, or some other method.
requires you to specify a resource list, either on the command line (for interactive jobs) or in the submit script (for batch jobs). There are a variety of resource types that may be requested (see the Cluster Resources
website for a complete list). The two most important resources are walltime
walltime places a hard limit on the running time of the job as a whole. Torque will kill any job that exceeds its requested walltime, so it is advantageous to over-estimate the necessary time for your job to run. However, Maui will grant priority to jobs with shorter walltime requests. As an example walltime=8:05:02 requests a run time of 8 hours, 5 minutes, and 2 seconds. There is currently no limit on the maximum walltime you may request for your job, but this may change in the future.
nodes allows you to specify the number of processors required for your job. The full format of the nodes resource is nodes=#NODES:ppn=PROCESSORS_PER_NODE. Therefore, nodes=3:ppn=4 specifies that the job will need 12 total processors, spread across 3 compute nodes, with 4 processors per node. The cuda cluster is made up of 4 compute nodes, with 8 processors per node, so nodes=4:ppn=8 is the maximum requirement possible on this cluster.
To start an interactive job, run qsub -I -l RESOURCE_LIST_1 -l RESOURCE_LIST_2. The following example starts an interactive job with a 2 hour runtime on 2 compute nodes with 8 processors per node:
cudahead% qsub -I -l nodes=2:ppn=8 -l walltime=2:00:00
qsub: waiting for job 740.cudahead.eecs.utk.edu to start
qsub: job 740.cudahead.eecs.utk.edu ready
To schedule a batch job, you must first create a submit script. The submit script contains all the resource specifications for the job, as well as the actual commands to run. The following example script requests a job with a 3 hour runtime on 4 compute nodes with 1 processor per node. It then uses SSH to log in to each assigned compute node and displays its hostname:
#PBS -l walltime=3:00:00
#PBS -l nodes=4:ppn=1
for node in $( cat $PBS_NODEFILE ) ; do
ssh $node 'hostname'
canceljob JOB_ID may be used to cancel running or queued batch jobs:
cudahead% canceljob 1662
job '1662' cancelled
To cancel a running interactive job, simply exit the shell on the compute node. To cancel a queued interactive job, press Control-C.
Compilers, debuggers, and profilers
The cuda head node has gcc-4.1, the Intel Compiler Suite v11.1, and the Intel Vtune Performance Analyzer v9.1. Both Intel products are installed in /opt/intel. Before using the Intel tools, you must load a shell configuration file. Users with bash/zsh should run
source /opt/intel/Compiler/11.1/064/bin/iccvars.sh intel64;
source /opt/intel/Compiler/11.1/064/bin/ifortvars.sh intel64
csh/tcsh users should run
source /opt/intel/Compiler/11.1/064/bin/iccvars.csh intel64;
source /opt/intel/Compiler/11.1/064/bin/ifortvars.csh intel64
Two versions of OpenMPI are installed on the cuda cluster. To configure your environment for RedHat's OpenMPI v1.2.7, run module load openmpi-rhel. To use the locally compiled OpenMPI v1.3.3, run module load openmpi-eecs.
The nVidia CUDA toolkit is installed on the cuda cluster. To configure your environment for CUDA, run module load cuda. For information on running CUDA applications on enabled EECS Ubuntu systems, please see the EECS CUDA documentation.
The Matlab installation on cuda has access to 32 Distributed Computing Server worker licenses. This allows you to submit parallel and/or distributed Matlab jobs to the cuda cluster from cudahead. Please note that the Matlab DCS makes use of Torque and Maui, so you should already have your environment set up to support this. Please see the CentOS Environment and Scheduling Software sections above.
Next, log in to cudahead with X11 forwarding enabled (see our remote access documentation for Windows, Unix and OS X for more details). Start Matlab, then click “Parallel > Manage Configurations” on the menu.
In the Configurations Manager window, click “File > Import”. Select the cuda.mat file and click “Open”. Finally, mark the cuda.eecs.utk.edu configuration as the default configuration, and exit the Configurations Manager.
You should now be able to run both parallel and distributed Matlab code from within Matlab on cudahead. For distributed jobs, Matlab will automatically create Torque requests with the appropriate number of processors. For parallel jobs, you must explicitly set the “ResourceTemplate” property of the Matlab scheduler object.
The cuda cluster is equipped with a 10-Gbps Infiniband network. Both versions of OpenMPI take advantage of the Infiniband hardware transparently, with no code modifications required. However, depending on your user environment, you may be required to specify btl=openib in your OpenMPI MCA configuration.
The cudahead node has 4 TB of RAID
-5 storage available for use as a temporary data store for compute job input and output files. To request a portion of this storage, send a request to
with the following information:
- The list of users/groups who need access to the storage
- The faculty sponsor for the storage
- A short justification for the use of the storage
- The desired storage size