Grace - 2010 to 2018

Building on the successful provision of High Performance Computing resources to the research community at UEA for a number of years, the Research Computing Services tendered for a new High Performance Computing Cluster to be installed in 2010 and looked to develop an ongoing partnership with a HPC provider. This included meeting a number of challenges: Provide effective and reliable HPC resource fitting the research communities requirements Sustainable HPC Making HPC more accessible

A partnership was formed with Viglen Ltd, who share the goals of developing High Performance Computing with us, and were keen  to engage in developing a true collaborative partnership to take Research Computing at UEA into the future.

The new cluster was funded by ISD and was a major advance on the existing resource, providing a significant increase in both core count and performance.

A competition was launched to name the cluster. Toby Richmond's (MAC) suggestion of ‘Grace' was selected from almost 800 entries. Toby said: "I thought it would be a good idea to recognise the contribution of female IT pioneers like Grace Hopper". The judges also noted the name provides a relevant and appropriate acronym in ‘Greener Research Computing Environment'.

Grace (the first iteration) ran Red Hat compatible Centos 5.5 and the powerful Platform LSF workload manager and consisted  of:

  •     168 Dual processor, six core Intel X5650 2.66GHz systems
  •     Each system with 24Gig of RAM (2Gig per core).
  •     Quad Data Rate Infiniband on 56 nodes – 672 parallel cores
  •     Total of 2016 cores
  •     Theoretical peak performance of 21.45TFlops
     

Grace used the existing GPFS storage cluster:

  •     2 Dell PowerEdge 1950 Intel Harpertown dual quad core 2.0GHz 8Gb RAM – SAN attached  nodes
  •     2 HP ProLiant DL380 G5 Intel dual quad core 2.0GHz 16GB RAM – SAN attached nodes
  •     2 HP ProLiant DL380 G5 Intel dual dual core 2.0GHz 16GB RAM – SAN attached nodes
  •     SVC attached IBM Enterprise SAN storage current capacity approx 32TB
  •     Dell MD3000/MD1000 Storage array current capacity approx 20TB
  •     HSM Tape archive current capacity 40TB
     

Over the following years there was an ongoing increase in capacity in grace, encompassing increased core count for both sequential and parallel computing, faster processors, and more capacity for large memory computing.  Storage capacity (backed  up, scratch, and archive) grew substantially to over 100Tb. 

In 2012 phase 2 added an additional 64 nodes providing an additional 1024 cores running on Intel Sandybridge E5-2670 2.6GHz CPUs.

In 2013 phase 3, the large memory machine count was increased, the addition of 68 compute nodes providing a total of 1088 Sandybride cores.  Once again these nodes were based on the Intel Xeon E5-2670 2.60GHz 8 core processors, providing each node with 16 computation slots, and 32GB of RAM.  Additional Infiniband infrastructure wereincluded, with the separation of nodes as:

  • 48 Infiniband nodes providing an additional 768 parallel slots
  • 20 standard Ethernet nodes providing an additional 320 sequential slots
  • added to the large memory resource
    • 18 nodes with 48GB of memory
    • 8 nodes with 64GB of memory
    • 1 nodes with 128GB of memory


The upgrade took Grace up to more than 300 computational nodes, increasing the core count up to 4148 cores, with a theoretical peak performance of nearly 65TFlops.  The new hardware was installed in the UEA Data Centre 1, separate from the majority of the existing Grace hardware which is located in UEA Data Centre 2, which helped improve service availability and resilience.

In 2014 phase 4 added a further 640 IB cores, and a storage upgrade.  The new Ivy Bridge CPU based compute nodes were, 2 x 10 core CPU’s (20 cores) running @ 2.5GHz, with 64GB of memory.