HPC - 2020
In 2015 we tendered for a new HPC partner, after our previous agreement reached the end of the four year term. We made a partnership with OCF, specialists in High Performance Computing in HEI and research institutes in the UK, along with Fujitsu as a hardware technology partner.
The first phase created the new hpc.uea.ac.uk cluster with an additional 98 nodes and 1760 computational cores. This included standard nodes, Infiniband parallel nodes and further GPU resource. Based on the same Operating System and Platform LSF scheduler as Grace, the new cluster had a refreshed software stack with newer versions of the OS and scheduler. It provided a familiar working environment with the intention over time to newer existing Grace hardware to this new cluster.
- upgraded the login node to a newer operating system and kernel, providing many benefits from a security and performance perspective
- introduced a secondary login node to provide more HPC login resilience
- upgraded the current (150) servers/compute nodes, to a newer operating system and kernel
- upgraded 4 of the 8 GPU servers/nodes with a newer operating system and kernel
- Installed an additional 104 nodes/servers into the HPC environment split in the following way:
- 60 x Ethernet servers/nodes running on Broadwell CPU architecture with 64GB of DDR4 on each node and 16 CPU cores
- 36 x Infiniband servers/nodes running on Broadwell CPU architecture with 128GB of DDR4 memory on each node and 24 CPU cores – These IB nodes used the latest IB interconnect (FDR and associated fabric) providing 56Gb/s IB performance
- Installed a new Infiniband 56Gb/s FDR network fabric to provide high speed interconnect/networking to the IB nodes
- 2 x huge memory servers/nodes running on Broadwell CPU architecture with 512GB of DDR4 memory on each 16 CPU cores
- 2 x huge memory servers/nodes running on Broadwell CPU architecture with 512GB of DDR4 memory on each 16 CPU cores, for a group of HPC/Bio users who invested in HPC equipment for their own dedicated usage
- This upgrade provided an additional 1680 CPU cores
In March 2018 we added and additional 68 broadwell compute nodes with 64 Gb RAM, 2 new GPU nodes (24C-Skylake,384GB,GPU-V100), and 2 new huge memory nodes (24C-Skylake,768GB).
In early 2019 we installed 28 (24 core, 96G RAM) Skylake Infiniband nodes, 16 (24 core, 96G RAM) Sklylake Ethernet nodes and 2 huge memory (24 core, 770G RAM) nodes. This has taken our total HPC core count from approx. 7000 to 8312 cores.