This page outlines the GPU infrastructure which is available as part of Grace. The cluster currently has a single GPU node with two Nvidia Tesla C2050 cards with each card being restricted to a single job.

GPU programming model - Compute Unified Device Architecture (CUDA)

Nvidia have provided the CUDA parallel computing architecture as an interface to their GPU cards. A quick introduction to GPUs and GPU programming can be found here. Also, click here to view a CUDA tutorial.

To load the CUDA environment, type:

module load cuda/3.2.16

Nvidia have provided the nvcc compiler for CUDA programs.

Submitting GPU jobs with Platform LSF

All GPU jobs must be submitted to the gpu LSF queue, which currently contains a single node (cn169) with two Nvidia Tesla GPU cards. The command bhosts -l cn169 will reveal the number of available GPU cards:

[ ... ]
               ngpus gpushared gpuexclusive gpuprohibited gpumode0 gputemp0
Total           2.0       2.0          0.0           0.0      0.0     70.0
Reserved        0.0       0.0          0.0           0.0      0.0      0.0

               gpuecc0 gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2
Total             0.0      0.0     57.0     0.0        -        -       -
Reserved          0.0      0.0      0.0     0.0      0.0      0.0     0.0
[ ... ]

The field to note is the gpushared under Total, which shows the number of available GPU cards. The Reserved column shows the number of GPU cards currently being used. The fields gputemp and gpuecc show the temperature and correctable memory errors, respectively. All other fields are to be ignored. The gpu queue is also restricted to only two running user jobs at any one time. So please only submit legitimate GPU jobs to this queue. If you submit more than two jobs, the remaining jobs will remain in a pending state until the running jobs complete.

To connect to the GPU for an interactive session use :

Xinteractive -q gpu -R "rusage[gpushared=1]"

 
The -q gpu ensures you are allocated the appropriate node.  The -R rusage option reserves one of the GPU cards for your task.

 

An example of an LSF submission script for a GPU job:

#!/bin/sh
#BSUB -q gpu
#BSUB -n 1
#BSUB -R "rusage[gpushared=1]"
#BSUB -oo gpu-%J.log
#BSUB -eo gpu-%J.log
#BSUB -J "gpu"
#BSUB -cwd "$HOME/gpu"
. /etc/profile
module load cuda/4.0.8
$HOME/gpu/gpu_code

The above script requests a single CPU core which can be changed to a value less than or equal to 12, which is the maximum number of available CPU cores on the node.

GPU Libraries

Library Module Documentation
cublas cuda/4.08 CUDA BLAS
cufft cuda/4.08 CUDA FFT
cusparse cuda/4.08 CUDA Sparse Solvers
curand cuda/4.08 CUDA Random Number Generator
npp cuda/4.08 CUDA Performance Primitives
magma magma/1.2.0 Linear Algebra on GPUs
magmablas magma/1.2.0 Linear Algebra on GPUs

 

To find the library location type the command below (Observe the LD_LIBRARY_PATH)

module show cuda/4.0.8

During the link stage, use one of the following commands:

-L/path/to/library -lname

/path/to/library/libname.so or /path/to/library/libname.a

Open Acceleration (OpenACC) OpenAcc provides a simple interface to GPUs using compiler directives. A presentation is available from the following link.