Parallel jobs are ones that run over more than one node, and pass messages between the processes on the nodes using openmpi.
SLURM requires a number of standard configuration settings such as queue and number of slots.
#SBATCH --mail-type=END,FAIL #Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk& # Where to send mail
#SBATCH -p compute #Submit to compute queue
#SBATCH -t 36:00:00 #Set time limit to 36 hours
#SBATCH --ntasks=96 #Set number of slots required
#SBATCH --job-name=parallel_job #Set job name
#SBATCH -o parallel-test-%j.out # Output to MatJob-(job_number).out
#SBATCH -e parallel-test-%j.err& # Error to MatJob-(job_number).err
#set up environment
module add mpi/openmpi/4.0.2/gcc/eth gcc/9.2.0
#run the application
echo Hosts are
srun -l hostname
If you need to run a parallel test interactively you will need to specify the number of slots you need.
srun -n4 -p compute -J interactive --time=36:00:00 --mem=30G --pty bash
then on the node run the mpi command
No IB queue
On hpc.uea.ac.uk we have special ib nodes with fast interconnect to run parallel jobs, which are served by the ib queues.
On ada.uea.ac.uk we have chosen to test running mpi jobs over the 10G ethernet connection (which is 10 times faster than that on standard compute nodes on hpc.uea.ac.uk). This means parallel jobs will run on the same compute queue on ADA.
We intend to migrate over the newer ib nodes from hpc.uea.ac.uk to provide an ib resource in the next stage of development of ADA.