Department of Electrical Engineering and Computer Science

EECS Clusters

The EECS department currently hosts one general-purpose research cluster: cuda. The cuda cluster was purchased in 2009, and includes 32 processor cores on 4 compute nodes. It also includes four nVidia Tesla GPGPU cards, and an Infiniband network.
Cluster Accounts
The EECS cuda cluster is part of the standard EECS account infrastructure; once granted access users may log in using their EECS usernames and passwords. To request access to a research cluster, send a message to ithelp(at)
CentOS Environment

The cuda cluster runs CentOS Linux (a binary-compatible clone of RedHat Linux), while most desktop and lab machines run Ubuntu Linux. Since Ubuntu and CentOS are not binary-compatible, any applications users wish to run on the clusters must be compiled on a CentOS system. In general, users should compile their code on the designated compile node, or the head node if no compile node has been assigned.
In addition, software installed on the CentOS clusters will likely be in different locations compared to Ubuntu. To alleviate this problem, users should take advantage of the Modules system, which allows easy configuration of one's shell environment.

To use Modules, you must first load the appropriate initialization files. These are located in  /usr/local/modules/init/. ZSH users should run  source /usr/local/modules/init/zsh, BASH users source /usr/local/modules/init/bash, etc. Next you must load the modulefiles that support your particular software. A list of all available modulefiles may be obtained by running module avail. Modulefiles may be loaded by running module load MODULENAME. Users who wish to submit jobs should run module load torque, while running module load cuda will allow you to compile and run code that depends on the CUDA libraries. 

Scheduling Software

Access to the cluster compute nodes is controlled through Torque/Maui, an open-source resource manager / job scheduler combo. In order to use Torque on cuda, users must load the torque modulefile as described in the “CentOS environment” section above. 

Informational Commands

showq can be used to view the current state of the cluster. It displays the number of online compute nodes and CPU cores, as well as the current usage. It also displays any active, queued, or blocked jobs. showq must be run from a terminal on the head node of the cluster being used.
cudahead% showq
ACTIVE JOBS--------------------
9                  steadmon    Running    16    00:59:30  Fri Sep 11 11:21:31
     1 Active Job       16 of   32 Processors Active (50.00%)
                         2 of    4 Nodes Active      (50.00%)

IDLE JOBS----------------------
10                 steadmon       Idle    24     1:00:00  Fri Sep 11 11:21:58
1 Idle Job 

BLOCKED JOBS----------------

Total Jobs: 2   Active Jobs: 1   Idle Jobs: 1   Blocked Jobs: 0

showstart JOB_ID shows an estimated start time for a queued job.

cudahead:~> showstart 10
job 10 requires 24 procs for 1:00:00
Earliest start in         00:59:24 on Fri Sep 11 12:21:31
Earliest completion in     1:59:24 on Fri Sep 11 13:21:31
Best Partition: DEFAULT

Scheduling Commands

qsub may be used to schedule new interactive or batch jobs. Interactive jobs start a shell session on a single compute node, while batch jobs run a user-provided script on a single compute node. It is important to remember that you are responsible for starting your code on the additional assigned compute nodes; this may be done with ssh, MPI programs such as mpirun, or some other method.
qsub requires you to specify a resource list, either on the command line (for interactive jobs) or in the submit script (for batch jobs). There are a variety of resource types that may be requested (see the Cluster Resources website for a complete list). The two most important resources are walltime and nodes.
walltime places a hard limit on the running time of the job as a whole. Torque will kill any job that exceeds its requested walltime, so it is advantageous to over-estimate the necessary time for your job to run. However, Maui will grant priority to jobs with shorter walltime requests. As an example walltime=8:05:02 requests a run time of 8 hours, 5 minutes, and 2 seconds. There is currently no limit on the maximum walltime you may request for your job, but this may change in the future.
nodes allows you to specify the number of processors required for your job. The full format of the nodes resource is nodes=#NODES:ppn=PROCESSORS_PER_NODE. Therefore, nodes=3:ppn=4 specifies that the job will need 12 total processors, spread across 3 compute nodes, with 4 processors per node. The cuda cluster is made up of 4 compute nodes, with 8 processors per node, so nodes=4:ppn=8 is the maximum requirement possible on this cluster.
To start an interactive job, run qsub -I -l RESOURCE_LIST_1 -l RESOURCE_LIST_2. The following example starts an interactive job with a 2 hour runtime on 2 compute nodes with 8 processors per node:
cudahead% qsub -I -l nodes=2:ppn=8 -l walltime=2:00:00
qsub: waiting for job to start
qsub: job ready


To schedule a batch job, you must first create a submit script. The submit script contains all the resource specifications for the job, as well as the actual commands to run. The following example script requests a job with a 3 hour runtime on 4 compute nodes with 1 processor per node. It then uses SSH to log in to each assigned compute node and displays its hostname:


#PBS -l walltime=3:00:00
#PBS -l nodes=4:ppn=1

for node in $( cat $PBS_NODEFILE ) ; do
   ssh $node 'hostname'

canceljob JOB_ID may be used to cancel running or queued batch jobs:

cudahead% canceljob 1662

job '1662' cancelled

To cancel a running interactive job, simply exit the shell on the compute node. To cancel a queued interactive job, press Control-C. 

Other Software

Compilers, debuggers, and profilers
The cuda head node has gcc-4.1, the Intel Compiler Suite v11.1, and the Intel Vtune Performance Analyzer v9.1. Both Intel products are installed in /opt/intel. Before using the Intel tools, you must load a shell configuration file. Users with bash/zsh should run 
source /opt/intel/Compiler/11.1/064/bin/ intel64;
source /opt/intel/Compiler/11.1/064/bin/ intel64

csh/tcsh users should run 
source /opt/intel/Compiler/11.1/064/bin/iccvars.csh intel64;
source /opt/intel/Compiler/11.1/064/bin/ifortvars.csh intel64

Two versions of OpenMPI are installed on the cuda cluster. To configure your environment for RedHat's OpenMPI v1.2.7, run module load openmpi-rhel. To use the locally compiled OpenMPI v1.3.3, run module load openmpi-eecs.


The nVidia CUDA toolkit is installed on the cuda cluster. To configure your environment for CUDA, run module load cuda. For information on running CUDA applications on enabled EECS Ubuntu systems, please see the EECS CUDA documentation.

The Matlab installation on cuda has access to 32 Distributed Computing Server worker licenses. This allows you to submit parallel and/or distributed Matlab jobs to the cuda cluster from cudahead. Please note that the Matlab DCS makes use of Torque and Maui, so you should already have your environment set up to support this. Please see the CentOS Environment and Scheduling Software sections above.

To set up your Matlab environment, first download the cluster definition file (cuda.mat) to your EECS home area.

Next, log in to cudahead with X11 forwarding enabled (see our remote access documentation for Windows, Unix and OS X for more details). Start Matlab, then click “Parallel > Manage Configurations” on the menu.

In the Configurations Manager window, click “File > Import”. Select the cuda.mat file and click “Open”. Finally, mark the configuration as the default configuration, and exit the Configurations Manager.

You should now be able to run both parallel and distributed Matlab code from within Matlab on cudahead. For distributed jobs, Matlab will automatically create Torque requests with the appropriate number of processors. For parallel jobs, you must explicitly set the “ResourceTemplate” property of the Matlab scheduler object.

For more information on writing parallel and/or distributed Matlab code, please see Mathworks' Parallel Computing Toolbox Users' Guide


The cuda cluster is equipped with a 10-Gbps Infiniband network. Both versions of OpenMPI take advantage of the Infiniband hardware transparently, with no code modifications required. However, depending on your user environment, you may be required to specify btl=openib in your OpenMPI MCA configuration. 


The cudahead node has 4 TB of RAID-5 storage available for use as a temporary data store for compute job input and output files. To request a portion of this storage, send a request to ithelp(at) with the following information:
  1. The list of users/groups who need access to the storage
  2. The faculty sponsor for the storage
  3. short justification for the use of the storage
  4. The desired storage size


The University of Tennessee, Knoxville. Big Orange. Big Ideas.

Knoxville, Tennessee 37996 | 865-974-1000
The flagship campus of the University of Tennessee System