Note

You need to specify the time for your job - this is different from hpc.uea.ac.uk.

If you don't, you will be allocated the default job length of 24 hours. 

The maximum job length is 7 days (168 hours).  Jobs exceedig this will be killed automatically.

 

Sequential Job Scripts

The primary way to run a job on the cluster is to submit it as a batch job (no interaction with it once the job is submitted) to the relevent queue, using a submit script.  For clarity, and to help us with any support you might require, submit scripts should end in a .sub extension.

Batch Jobs can be run against any valid queue as defined in the batch script itself, and are submitted to the cluster using a cluster command called bsub, so the cluster batch job scripts are known as submit scripts.  
A batch submit script defines everything about your job, including what, how and where you want it to run. Examples of what might be configured in a submit script include;

  • Which queue you want to use
  • Whether you want to share a node, or get exclusive access to a node in the queue
  • How long the job should run
  • The name of the job
  • Where you want output to go (output and error streams etc)
  • How much RAM the job will need (esp if different from the default 4Gb per job)
  • At what amount of RAM usage the job can be automatically stopped by the scheduler (to protect the cluster if something goes wrong)
  • Which modules you need to load
  • Which applications you need to run and the commands you want to run against them

Slurm job scripts can be configured to request particular cluster resource, i.e. the queue to run the job on or the name of the job. After you have added your slurm directives to your job submission script, the required modules must be loaded and finally the command is executed. 

Set a run time

You must set a time limit for your job - otherwise it will be allocated the default 24 hours.  If you don't know how long it will take to run, set a long time for your first run.  Once you have a feel for the run time, reduce your time limit to slightly more than you expect to need.  Jobs with shorter times are more likely to find an appropriate resource, and therefore be run sooner.

slurm options

By default both standard output and standard error are directed to the same file.

slurm option description
#SBATCH -t 36:00:00 Set job time 36 hour
#SBATCH -p compute set queue to use
#SBATCH --output=test_%j.log set standard output and error combined log
#SBATCH -e test-%j.out set error output file
#SBATCH -e test-%j.err set error output file
#SBATCH --exclusive set exclusive use of node
#SBATCH --mail-type=ALL set when to send mail events  -
options NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-user=<username>@uea.ac.uk Where to send mail
#SBATCH --mem 2G set memory limit
Different units can be specified using the suffix [K|M|G|T]
#SBATCH --array=1-10 set an array of 10 jobs


Long running jobs

Jobs which must run for a long period of time achieve the best throughput when composed of many small jobs using checkpoint/restart chained together.  This means writing out data that is required for starting the next stage of the job, and then submitting the second job once the first is finished.  This method of working also means large jobs can be easily restarted, and means less potential for lost work if a job crashes.

Setting the environment

. /etc/profile (used in HPC scripts) is no longer required on ADA scripts.  If you leave it in, it can cause problems with modules if you have default ones added in your .bashrc.  Please see ada-modules for advice on how to setup your .bashrc for ADA.

Submitting jobs

sbatch JobScriptName.sub

Example job submit scripts

  • An example of a Matlab job script :

#!/bin/bash
#SBATCH --mail-type=END,FAIL    # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk     # Where to send mail
#SBATCH -p compute             # Select compute queue
#SBATCH -t 36:00:00             # Set time limit to 36 hours
#SBATCH --job-name=matlab-test_job      # Set job name
#SBATCH -o matlab-test-%j.out               # Write job output to MatJob-(job_number).out
#SBATCH -e matlab-test-%j.err               # Write job error to MatJob-(job_number).err
#set up environment
module add matlab/2018a
#run the application

matlab -nodisplay -nojvm -nodesktop -nosplash -r my_matlab_m_file

  • An example of a Stata15 multicore job script :

#set up environment#!/bin/bash
#SBATCH --mail-type=ALL           #Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk    # Where to send mail
#SBATCH -p compute                  #Which queue to use
#SBATCH -t 36:00:00                               # Set time limit to 36 hours
#SBATCH --exclusive                      # set exclusive use of node
#SBATCH --job-name=stata-test_job #Job name
#SBATCH -o stata-test-%j.out       #Standard output log
#SBATCH -e stata-test-%j.err       #Standard error log
#set up environment
module add stata/15
#run the application
stata-mp -b do stata15_example.dostata

  • An example of an R job :

#!/bin/bash
#SBATCH --mail-type=ALL                    #Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk           # Where to send mail
#SBATCH -p compute                            #Which queue to use
#SBATCH -t 36:00:00                               # Set time limit to 36 hours
#SBATCH --job-name=R-test_job        #Job name
#SBATCH -o R-test-%J.out                    #Standard output log  
#SBATCH -e R-test-%J.err                     #Standard error log
#set up environment

module add R/3.6.2
#run the application
R CMD BATCH TestRFile.R