Parallel
Parallel Jobs
Again, LSF requires a number of standard configuration settings such as queue (for parallel tasks, this would normally be on the Infiniband nodes defined as short-ib, medium-ib and long-ib) and number of slots. Please note that hpc cluster, has a mix of 16,24 and 28 core nodes.
#!/bin/sh
#BSUB -q mellanox-ib
#BSUB -n 96
#BSUB -R 'span[ptile=28]'
#BSUB -R 'cu[maxcus=1]'
#BSUB -oo vig_hpl_ibgpfs-%J.log
#BSUB -eo vig_hpl_ibgpfs-%J.log
#BSUB -J "vig_hplIB"
. /etc/profile
module add mpi/openmpi/3.1.3/gcc/mellanox
mpirun MyParallelBinary
#BSUB -q mellanox -ib Submits to the mellanox Infiniband queue
#BSUB -n 96 Requests 96 parallel job slots
#BSUB -R 'span[ptile=28]' Attempts to use all slots available on a node
#BSUB -R 'cu[maxcus=1]' Attempts to use nodes on the same computational unit (i.e nodes on the same IB switch)
Infinband Networks
The mellanox nodes - two generations which serve the mellanox-ib and sky-ib queues.
For optimal performance for smaller jobs, it is more efficient to ensure the job doesn't span multiple networks, so -R 'cu[maxcus=1]' is set.
It is recommended you use the mpi/openmpi modules for MPI capabilities.
The older qlogic nodes serve the short-ib and long-ib queues and are no longer recommended to run parallel jobs.
Submitting Jobs
bsub < JobScriptName.bsub