Parallel Jobs

Parallel jobs are ones that run over more than one node, and pass messages between the processes on the nodes using openmpi.

SLURM requires a number of standard configuration settings such as queue and number of slots.

#!/bin/bash
#SBATCH --mail-type=END,FAIL   #Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk&    # Where to send mail
#SBATCH -p compute     #Submit to compute queue
#SBATCH -t 36:00:00      #Set time limit to 36 hours
#SBATCH --ntasks=96    #Set number of slots required
#SBATCH --job-name=parallel_job   #Set job name
#SBATCH -o parallel-test-%j.out    # Output to MatJob-(job_number).out
#SBATCH -e parallel-test-%j.err&   #  Error to MatJob-(job_number).err
#set up environment
. /etc/profile
module add mpi/openmpi/4.0.2/gcc/eth gcc/9.2.0
#run the application
echo Hosts are
srun -l hostname
mpirun MyParallelBinary

Submitting Jobs


sbatch JobScriptName.sub

If you need to run a parallel test interactively you will need to specify the number of slots you need.

srun -n4 -p compute -J interactive --time=36:00:00 --mem=30G --pty bash

then on the node run the mpi command

No IB queue

On hpc.uea.ac.uk we have special ib nodes with fast interconnect to run parallel jobs, which are served by the ib queues.

On ada.uea.ac.uk we have chosen to test running mpi jobs over the 10G ethernet connection (which is 10 times faster than that on standard compute nodes on hpc.uea.ac.uk).  This means parallel jobs will run on the same compute queue on ADA.

We intend to migrate over the newer ib nodes from hpc.uea.ac.uk to provide an ib resource in the next stage of development of ADA.