How much memory does my job need ?

This will vary from job to job.  How long is a piece of string ?  This may be your first instinct.  How do I know how much it needs ?

I have no idea !

We strongly recommend that you run a benchmarking job to verify approximate memory usage, prior to submitting jobs using a new method or dataset,  Then use that figure as a guide to memory resource requests when submitting your jobs.  To start with, ask for a bit more than your test job required.  Review use once jobs have run.  If you are consistently using less memory than requested, adjust your submit script accordingly.  This will also mean your job is likely to run sooner.

Large datasets

If you are using a large dataset that will be held in memory at the same time (ie you read it all in at the start), a rule of thumb is you will need at least as much memory as the size of the data file.  If you have very large datasets, if you split the data into smaller portions and run multiple jobs rather than trying to run it in one big job.  It is likely to be more efficient and  your jobs will probably run sooner.  If you have data for a year, but you only want to deal with August, extract the August data first, and then use that.

Understand what your job is doing if using large data arrays.

Profiling Tools

For specific software apps, there are some useful profiling tools

First time runs

When running a job for the first time, if you think your task may have larger than 4G memory requirements but don't have a more accurate estimate (from calculating expected requirements or from comparison to either a similar task (on or or a similar task running on your own desktop PC) - try running your task in one of the following ways:

  • include a conservative memory resource request #SBATCH --mem 10G in your job submission script
  • Run in a hmem interactive session,  interactive-hmem

Reviewing memory requirements

One way of identifying the memory requirements of task is to look at what memory similar tasks have required. 

  • sacct --format="CPUTime,MaxRSS" --jobs=<job id>

[s154@login02 ~]$ sacct --format="CPUTime,MaxRSS" --jobs=200250         
   CPUTime     MaxRSS
---------- ----------
    00:01:00      3572K

  • Job output logs include details about resource usage which includes the maximum memory used during the job run