This page outlines the GPU infrastructure which is available as part of Grace. The cluster currently has a single GPU node with two Nvidia Tesla C2050 cards with each card being restricted to a single job.
GPU programming model - Compute Unified Device Architecture (CUDA)
Nvidia have provided the CUDA parallel computing architecture as an interface to their GPU cards. A quick introduction to GPUs and GPU programming can be found here. Also, click here to view a CUDA tutorial.
To load the CUDA environment, type:
module load cuda/3.2.16
Nvidia have provided the nvcc compiler for CUDA programs.
Submitting GPU jobs with Platform LSF
All GPU jobs must be submitted to the gpu LSF queue, which currently contains a single node (cn169) with two Nvidia Tesla GPU cards. The command bhosts -l cn169 will reveal the number of available GPU cards:
[ ... ]
ngpus gpushared gpuexclusive gpuprohibited gpumode0 gputemp0
Total 2.0 2.0 0.0 0.0 0.0 70.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0
gpuecc0 gpumode1 gputemp1 gpuecc1 gpumode2 gputemp2 gpuecc2
Total 0.0 0.0 57.0 0.0 - - -
Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[ ... ]
The field to note is the gpushared under Total, which shows the number of available GPU cards. The Reserved column shows the number of GPU cards currently being used. The fields gputemp and gpuecc show the temperature and correctable memory errors, respectively. All other fields are to be ignored. The gpu queue is also restricted to only two running user jobs at any one time. So please only submit legitimate GPU jobs to this queue. If you submit more than two jobs, the remaining jobs will remain in a pending state until the running jobs complete.
To connect to the GPU for an interactive session use :
Xinteractive -q gpu -R "rusage[gpushared=1]"
An example of an LSF submission script for a GPU job:
#BSUB -q gpu
#BSUB -n 1
#BSUB -R "rusage[gpushared=1]"
#BSUB -oo gpu-%J.log
#BSUB -eo gpu-%J.log
#BSUB -J "gpu"
#BSUB -cwd "$HOME/gpu"
module load cuda/4.0.8
The above script requests a single CPU core which can be changed to a value less than or equal to 12, which is the maximum number of available CPU cores on the node.
|cusparse||cuda/4.08||CUDA Sparse Solvers|
|curand||cuda/4.08||CUDA Random Number Generator|
|npp||cuda/4.08||CUDA Performance Primitives|
|magma||magma/1.2.0||Linear Algebra on GPUs|
|magmablas||magma/1.2.0||Linear Algebra on GPUs|
To find the library location type the command below (Observe the LD_LIBRARY_PATH)
module show cuda/4.0.8
During the link stage, use one of the following commands:
/path/to/library/libname.so or /path/to/library/libname.a
Open Acceleration (OpenACC) OpenAcc provides a simple interface to GPUs using compiler directives. A presentation is available from the following link.