What does pending mean ?

A job is put into pending by the queueing system when there aren't currently enough resources free to meet the job requirements.

This may be because the nodes are very busy on the queue you asked for.  It may be that you have asked for a lot of memory, or a lot of slots, or a long time limit.  If other jobs asking for fewer requirements are also pending in the queue, they may be slotted in ahead of your job.

What can I do to get my job running ?

  • Be patient - the system will allocate it as soon as resources are available.
  • Review what you are asking for - do you need as much memory, slots (tasks), or time.  If you reduce the requirements it is likely to run sooner.
  • Change the queue you are asking for.  There are a variety of queues with differing amounts of resource.  Generally the  more specialised the hardware the fewer slots available (because they cost significantly more than standard compute nodes).
  • sometimes a job will pend indefintely because the resources you have requested don't exist on the queue you have chosen.

Why is my job pending ?

You can use a variety of commands to work out why your job is pending

  • See a list of your jobs

squeue -u abc19weu

  • See the reason why your job is pending (Priority just means that other jobs were submitted first and are ahead of your job in the queue)

scontrol show job 624651 | grep Reason

   JobState=PENDING Reason=Priority Dependency=(null)

  • See which queue (partition) it is pending on

scontrol show job 624651 | grep Partition

   Partition=compute-24-96 AllocNode:Sid=c0067:2351

  • See how many CPUs or memory you are requesting

scontrol show job 624651 | grep cpu

   TRES=cpu=24,mem=80G,node=1,billing=24

  • Total number of jobs pending on a given queue

squeue --partition=compute-24-96 | grep PD | wc -l

  • Total number of jobs pending for you on a given queue

squeue -u abc19weu --partition=compute-24-96 | grep PD | wc -l

  • Total number of jobs pending for you on a given queue due to priority (other users waiting for resource that submitted before you):

squeue -u abc19weu --partition=compute-24-96 | grep PD | grep Priority | wc -l

  • How many nodes have 32 GB memory available?

lshosts | grep compute-24-96 | awk '$5>=32{print}' | wc -l

Is there resource on other queues ?

  • See how many nodes on a given partition are completely free

lshosts | grep compute-24-96 | grep IDLE | wc -l

  • See if there is resource available on other partitions

lshosts | grep compute-24-128 | grep IDLE | wc -l

lshosts | grep compute-16-64 | grep IDLE | wc -l