The best practice when submitting a job is to ensure your job has appropriate resources available by requesting the amount of memory required. By including a memory resource request when submitting a job, the job scheduler can allocate your task appropriately to a compute node that has the available memory and make sure your job isn't vying with another.
To request the required memory for your job in a job submission script, use something similar to that given below, which will allocate 8 Gb of RAM for your job.
#SBATCH --mem 8G
Requesting additional memory for an interactive job
The following example would start an interactive session, 6GB of memory, with a runtime of 12 hours
srun -n1 -p compute -J interactive --time=12:00:00 --mem-per-cpu=6G --pty bash
You can select the gpu or hmem queue by changing the -p option.
Requesting too much memory
It is important to make sure your memory resource request is appropriate; otherwise your job may not be allocated. For example, requesting too much memory (e.g. more than is available on a host) will result in your job pending for a long period as no node can satisfy the resource request. If you have included a memory resource and find your submitted job remains pending, check with squeue <JobID> for the pending reason; the following message indicates the resource requirement cannot currently be met by any hosts.
srun: error: Memory specification can not be satisfied
srun: error: Unable to allocate resources: Requested node configuration is not available
In a parallel job, the memory resource request and memory limit are treated on the overall task. For example, a parallel job with 8 tasks submitted with a memory limit of 4G will terminate when the combined memory usage exceeds 4G (e.g. 500MB per task).
In an array job, each element of the array is treated independently, which means memory resource requests and memory limit work on each element of the array separately. For example, if an array job of 10 elements is submitted with a memory limit of 4G each element will continue running while the memory usage is below 4G. Any element that exceeds 4G will be terminated, however leaving the other elements to continue running.