What is GPU ?

According to Wikipedia, GPU stands for Graphics Processing Unit - a specialised electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel.

GPUs were initially used to accelerate the memory-intensive work of texture mapping and rendering polygons, later adding units to accelerate geometric calculations such as the rotation and translation of vertices into different coordinate systems.

Recent developments in GPUs include support for programmable shaders which can manipulate vertices and textures with many of the same operations supported by CPUs, oversampling and interpolation techniques to reduce aliasing, and very high-precision colour spaces. Because most of these computations involve matrix and vector operations, engineers and scientists have increasingly studied the use of GPUs for non-graphical calculations; they are especially suited to other embarrassingly parallel problems.

With the emergence of deep learning, the importance of GPUs has increased. In research done by Indigo, it was found that while training deep learning neural networks, GPUs can be 250 times faster than CPUs. The explosive growth of Deep Learning in recent years has been attributed to the emergence of general purpose GPUs.

Compute Unified Device Architecture (CUDA)

Nvidia have provided the CUDA parallel computing architecture as an interface to their GPU cards. It allows the use a CUDA-enabled GPU for general purpose processing – known as GPGPU (General-Purpose computing on Graphics Processing Units). The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels

The CUDA platform is designed to work with programming languages such as C, C++, and Fortran. This accessibility makes it easier to use GPU resources. There are lots of CUDA tutorials online.

GPU Nodes

We currently have 4 nodes (g01-g04) - Intel Xeon Silver 4116 2.1GHz, each with 2 NVIDIA Quadro P5000 cards, 24 CPUs, 384GB DDR4 RAM.

Submitting GPU jobs

GPU jobs should be submitted to the gpu queue:

  • To connect to the gpu queue for an interactive session, to test jobs, use :

interactive-gpu

  • To connect to the gpu queue for an batch job, you need to select the gpu queue :

An example of an slurm submission script for a GPU job:

#!/bin/bash
#SBATCH --mail-type=ALL           #Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk    # Where to send mail
#SBATCH --nodes=1       #limit to one node
#SBATCH -p gpu                 #Which queue to use
#SBATCH --gres=gpu:1              # Number of GPUs (per node)
#SBATCH --mem=4G               # memory (per node)
#SBATCH --time=0-03:00            # time (DD-HH:MM)
#SBATCH --job-name=gpu-test_job #Job name
#SBATCH -o gpu-test-%j.out       #Standard output log
#SBATCH -e gpu-test-%j.err       #Standard error log
#set up environment

module load cuda/10.2
#run the application
/gpfs/home/s154/gpu_code

The above script requests a single GPU (#SBATCH --gres=gpu:1) which can be changed to 2 if you need to use both GPUs on a node.

To load the CUDA environment

module load cuda/10.2

Nvidia have provided the nvcc compiler for CUDA programs.  There is extensive documentation for the Cuda environment.

GPU Libraries

Library Module Documentation
cublas cuda/10.2 CUDA BLAS
cufft cuda/10.2 CUDA FFT
cusparse cuda/10.2 CUDA Sparse Solvers
curand cuda/10.2 CUDA Random Number Generator
npp cuda/10.2 CUDA Performance Primitives

 

To find the library location LD_LIBRARY_PATH:

module show cuda/10.2

During the link stage, use one of the following commands:

  • -L/path/to/library -lname
  • /path/to/library/libname.so
  • /path/to/library/libname.a