Parallel Jobs

Again, LSF requires a number of standard configuration settings such as queue (for parallel tasks, this would normally be on the Infiniband nodes defined as short-ib, medium-ib and long-ib) and number of slots. Please note that hpc cluster, has a mix of 16,24 and 28 core nodes.

#!/bin/sh
#BSUB -q mellanox-ib
#BSUB -n 96
#BSUB -R 'span[ptile=28]'
#BSUB -R 'cu[maxcus=1]'
#BSUB -oo vig_hpl_ibgpfs-%J.log
#BSUB -eo vig_hpl_ibgpfs-%J.log
#BSUB -J "vig_hplIB"
. /etc/profile
module add mpi/openmpi/
3.1.3/gcc/mellanox
mpirun MyParallelBinary

#BSUB -q mellanox -ib   Submits to the mellanox Infiniband queue

#BSUB -n 96  Requests 96 parallel job slots

#BSUB -R 'span[ptile=28]'  Attempts to use all slots available on a node

#BSUB -R 'cu[maxcus=1]'   Attempts to use nodes on the same computational unit (i.e nodes on the same IB switch)

Infinband Networks

The mellanox nodes - two generations which serve the mellanox-ib and sky-ib queues. 

For optimal performance for smaller jobs, it is more efficient to ensure the job doesn't span multiple networks, so -R 'cu[maxcus=1]' is set.
 

It is recommended you use the mpi/openmpi modules for MPI capabilities.

The older qlogic nodes serve the short-ib and long-ib queues and are no longer recommended to run parallel jobs.

 

Submitting Jobs

bsub < JobScriptName.bsub