A word of warning.

Submitting large numbers of jobs to the cluster can have disastrous consequences if not done correctly, as one can overload the scheduler, bringing the cluster to a grinding halt.

Array tasks allow you to create and submit a single job script, but have it run multiple times with different input datasets, and process each one in sequence.  This is useful for ‘high throughput' tasks, for example where you want to repeat a simulation with different driving data.

Taking a simple R submission script as an example:

#!/bin/bash
#SBATCH --mail-type=ALL     #Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=<username>@uea.ac.uk    # Where to send mail
#SBATCH -p compute   #Which queue to use
#SBATCH --job-name=R-test_job     #Job name
#SBATCH -o R-test-%j.out    #Standard output log
#SBATCH -e R-test-%j.err     #Standard error log
#SBATCH -t 0-20:00 # Running time of 20 hours
. /etc/profile
module add R/3.6.1/container
R CMD BATCH TestRFile.R dataset1.csv

If you wanted to run the simulation TestRFile.R with inputs dataset2.csv through to dataset10.csv you could create and submit a job script for each dataset.  However, by setting up an array job, you could create and submit a single script. 

The corresponding array script for the above example would look something like:

#!/bin/bash
#SBATCH --mail-type=ALL     #Mail events (NONE, BEGIN, END, FAIL, ALL, ARRAY_TASKS)
#SBATCH --mail-user=<username>@uea.ac.uk    # Where to send mail
#SBATCH -p compute   #Which queue to use
#SBATCH --job-name=R-test_job     #Job name
#SBATCH -o R-test-%A-%a.out    #Standard output log
#SBATCH -e R-test-%A-%a.err     #Standard error log
#SBATCH --array=1-10  #Array range
#SBATCH -t 0-20:00 # Running time of 20 hours

. /etc/profile
module add R/3.6.1/container
echo "SLURM_ARRAY_TASK_ID"
R CMD BATCH TestRFile.R dataset1.csv
Rscript TestRFile.R dataset$SLURM_ARRAY_TASK_ID.csv

  • The array is created in the job name directive by including [1-10] to represent our 10 variations
  • The error and output file have an additional %A included in the name, a variable to represent the index of the task
  • The R command is updated to include the variable $SLURM_ARRAY_TASK_ID a variable to represent the index of the task 


When the job is submitted, slurm will create 10 tasks under the single job ID.  The %A-%a and $SLURM_JOBINDEX variables will match the index of the task.

The job is submitted in the same way as a normal job:

[abc@login02 ~]$ sbatch Rarray.sub
Job <2494256> is submitted to queue .

The limit how many array jobs run at the same time:

[abc@lgn02 ~]$ sbatch --array [1-10]%5 Rarray.sub
Job <2494256> is submitted to queue .

If you use squeue to list your active jobs, you will see 10 tasks with the same Job ID.  The tasks can be distinguished by the [index] under the Job_Name  

[cc@lgn02 ~]$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
2494256 cc RUN short-etg lgn02 rack01 R_job[2] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[3] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[1] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[4] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[5] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[6] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[7] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[8] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[9] Jul 9 12:27
2494256 cc RUN short-eth lgn02 rack01 R_job[10] Jul 9 12:27

If you use bkill JOBID to terminate the job, all tasks within the array will be terminated.  If you wish to only terminate an individual task, you need to use bkill JOBID[INDEX]:

[cc@lgn02 ~]$ bkill 2494256[6]
Job <2494256[6]> is being terminated

Similarly, if you wish to examine a particular job, you need to use the same JOBID[INDEX] syntaxq, i.e. bjobs –l JOBID[INDEX]:

The array step can be adjusted in a number of ways:

  • #BSUB –J R_Job[1-10] will run tasks 1,2,3 etc up to 10
  • #BSUB –J R_Job[1-10:2] will run tasks between 1 and 10 incrementing in 2, so 1, 3, 5, 7, 9
  • #BSUB –J R_Job[1,2,5,10] will runs tasks in the list, 1, 2, 5 and 10
  • #BSUB –J R_Job[1,2,5,7-10] will run tasks 1, 2, 4, 7, 8, 9, 10

With a bit of work it is possible to create more complex variations, for example running a similar job this time with a single data set but with different start and end values for each run:

#!/bin/sh
#BSUB -q short-eth
#BSUB -J R_job[1-2]
#BSUB -oo R-%J-%I.out
#BSUB -eo R-%J-%I.err
. /etc/profile
module add R/3.3.2

let "INDEX_START = (LSB_JOBINDEX -1 ) * 500"
let "INDEX_END = (LSB_JOBINDEX * 500 ) - 1"

Rscript TestRFile.R datset1 $INDEX_START $INDEX_END

This would result in the following being run:

  • Rscript TestRFile.R dataset1 0 499
  • Rscript TestRfile.R dataset1 500 999