Hardware
Grace Technical Specifications
Grace consists of 334 computational nodes resulting in 4784 cores, available for analytical tasks. This results in a theoretical peak performance of 79 TFlops
CPU Specification
There are 32 computation nodes with Intel Ivy Bridge Dual processor, 10 core E5-2670 v2 CPU's in them
There are 132 computational nodes with Intel Sandybridge Dual processor, 8 core E5-2670 2.6GHz CPU's in them.
There are 168 computational nodes with Intel Westmere Dual processor, six core X5650 2.66GHz CPU's in them.
As the codes running on the GRACE cluster are mainly large MPI parallel codes and not multi-threaded (using OpenMP or POSIX threads), the Intel Hyper-Threading technology has been switched off. Note that all the CPUs implement the 64 bit architecture, so all codes should be built in a 64 bit environment to ensure compatibility.
Memory Specification
The computational nodes use DDR3 RAM running at 1066 MHz. There is currently over 11.5TB's of memory available for use on Grace.
Power Supply Specification
Each chassis within a server rack, houses four computational nodes, which share two redundant powers supply units (PSU). This means energy is saved by sharing PSU's, whilst maintaining redundancy.
High Performance Interconnect Network
Grace currently has 160 computational nodes which utilise the 40Gbps Infiniband (IB) network/fabric. This high speed low latency network is used for parallel jobs, which require inter-process MPI communication. The IB network consists of multiple IB network switches connected to the IB compute nodes. See below for an Ethernet/IB comparison.
Ethernet | Infiniband (IB) | |
---|---|---|
Latency | 35 microseconds | 1.1 microseconds |
Bandwidth | 1 Gb/s | 40 Gb/s |
GPU accelerator cards
The Graphical Processor Unit (GPU) node has two Nvidial Tesla Fermi C2050 cards for accelerating scientific codes. GPU architectures include a large number of lightweight cores which are useful for single precision codes that are highly parallelisablle. Double precision codes can also run on the C2050 cards, but produce half the performance of single precision codes. Below is a brief specification of a C2050 card:
- 448 cores operating at 1.15 GHz;
- 3 GB (or 2.6 GB of available memory as the remaining memory is used for ECC parity) of graphical DDR5 memory;
- 515 GFLOPs of double precision performance;
- 1030 GFLOPs of single precision performance;
memory frequency of 1.5 GHz (operating faster than the GPU cores);
memory bandwidth of 144 GB/s.