At the heart of the cluster is the software which manages the workload. uses SLURM which is an opensource queueing system. Essentially it's a program that balances use across available resource, by allocating jobs to appropriate compute nodes.

Note that in Slurm, a "queue" is called a "partition".  We have used the word queue to avoid confusion for users migrating from HPC.

A video of training using slurm provided by OCF and the HPC team at UEA.

Basic SLURM commands

Command Description
sbatch <job.sub> Submit your batch job to the queue
squeue -u <userid> Shows a list of your jobs
squeue -u <userid> -l Lists your jobs in more detail
squeue -u <userid> -t RUNNING Lists your running jobs
squeue -u <userid> -t PENDING Lists your pending jobs
squeue -u <userid> --start Shows estimated start time of your jobs
squeue -j <JOBID> Shows detailed information about your job (JOBID = job number)
scancel <JOBID> Kill your job.  You can kill multiple jobs by using a comma separated list
scancel -t PENDING -u <userid> Kill all your pending jobs
scancel -u <userid> Kill all your jobs
sstat -j <JOBID>

Statistics about the job.  By default this gives a lot of information, but can be limited to the variables you want 
eg sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -jobs=<JOBID>

sacct -e Find what options are available for job statistics
sacct -j <JOBID> Information about a past job from the current day
sacct -j <JOBID>  --starttime=2019-03-12

Information about a past job from an earlier date
eg sacct --jobs=<JOBID> --starttime=2019-03-12 --format=jobname,nnodes,ncpus,maxrss,elapsed


SBATCH commands

The SBATCH commands listed below should be added to your submission scripts. If you are in doubt as to which directives to use, please email the HPC support team.

Command Description
#SBATCH -o output.log Write output to output.log (use %j to include JOBID) will append to output.log if it exists
#SBATCH -e output.err  Write errors to output.err (use %j to include JOBID) "output-%j.err")
#SBATCH --exclusive  Grants your job exclusive access to the node
(Check with HPC admin prior to using)
#SBATCH --ntasks=16 Request a number of slots (16 in this case)
#SBATCH -p compute Set queue to use
#SBATCH -t 36:00:00 Set job time 36 hour
#SBATCH --job-name=test_job Job Name
#SBATCH -J test_job[%A_%a] Job name of array job
#SBATCH --array=1-10 Set an array of 10 jobs
#SBATCH --mem 8G Request a specific amount of memory for your job
#SBATCH --mail-user=<username> Where to send mail
#SBATCH --mail-type=BEGIN Email when job starts
#SBATCH --mail-type=END Email when job finishes (you don't need to supply your email)