Grace Technical Specifications

Grace consists of 334 computational nodes resulting in 4784 cores, available for analytical tasks. This results in a theoretical peak performance of 79 TFlops

CPU Specification

 

There are 32 computation nodes with Intel Ivy Bridge Dual processor, 10 core E5-2670 v2 CPU's in them

There are 132 computational nodes with Intel Sandybridge Dual processor, 8 core E5-2670 2.6GHz CPU's in them. 

There are 168 computational nodes with Intel Westmere Dual processor, six core X5650 2.66GHz CPU's in them.

As the codes running on the GRACE cluster are mainly large MPI parallel codes and not multi-threaded (using OpenMP or POSIX threads), the Intel Hyper-Threading technology has been switched off. Note that all the CPUs implement the 64 bit architecture, so all codes should be built in a 64 bit environment to ensure compatibility.

Memory Specification

The computational nodes use DDR3 RAM running at 1066 MHz. There is currently over 11.5TB's of memory available for use on Grace.

Power Supply Specification

Each chassis within a server rack, houses four computational nodes, which share two redundant powers supply units (PSU). This means energy is saved by sharing PSU's, whilst maintaining redundancy.

High Performance Interconnect Network

Grace currently has 160 computational nodes which utilise the 40Gbps Infiniband (IB) network/fabric. This high speed low latency network is used for parallel jobs, which require inter-process MPI communication. The IB network consists of multiple IB network switches connected to the IB compute nodes. See below for an Ethernet/IB comparison.

  Ethernet Infiniband (IB)
Latency 35 microseconds 1.1 microseconds
Bandwidth 1 Gb/s 40 Gb/s

 

GPU accelerator cards

The Graphical Processor Unit (GPU) node has two Nvidial Tesla Fermi C2050 cards for accelerating scientific codes. GPU architectures include a large number of lightweight cores which are useful for single precision codes that are highly parallelisablle. Double precision codes can also run on the C2050 cards, but produce half the performance of single precision codes. Below is a brief specification of a C2050 card:

  • 448 cores operating at 1.15 GHz;
  • 3 GB (or 2.6 GB of available memory as the remaining memory is used for ECC parity) of graphical DDR5 memory;
  • 515 GFLOPs of double precision performance;
  • 1030 GFLOPs of single precision performance;

memory frequency of 1.5 GHz (operating faster than the GPU cores);

memory bandwidth of 144 GB/s.