HPC Data Storage
There are various options for storing data on the HPC clusters, depending on the nature of the data. There are three main factors to consider when thinking about data storage:
- Is the data actually needed for the running a HPC task?
- How important is the data, can it be reproduced or downloaded again?
- How big is the data?
General Information about HPC Storage
Please consider the following when using HPC storage:
- You initially land in your cluster home directory; /gpfs/home/<username>
- You also have a separate ‘scratch’ space at /gpfs/scratch/<username> (which you can access from your home directory through the shortcut /gpfs/home/<username>/scratch, or from any directory with ~/scratch). If you are using scratch in a script it is preferable to use the pathway /gpfs/scratch/<username>
- Your ‘home’ and ‘scratch’ areas are high performance storage, suitable for high compute and lots of I/O.
- Filesystem backups are designed for disaster recovery only, please take care with file and data management
- HPC data storage is only for HPC data. Please do not use HPC data storage as a backup for your desktop/laptop, documents or music collections -it will be deleted.
- If you exceed your quota it can cause issues, so you will receive an email asking you to deal with it as soon as possible. If you don't deal with it within 7 days, you will not be able to run jobs until you come back within quota.
An overview of the available storage is shown in the table below:
|Name||Path||Use for HPC Task||Backed Up||Default Quota|
You have a (local) home directory on the cluster "/gpfs/home/username" which is backed up to tape on a nightly basis and is located on the Storage Area Network (SAN).
The likely uses for this storage area are: Analysed output, model code, and data which can't be easily replicated.
You also have a scratch area "/gpfs/scratch/username" (e.g. /gpfs/scratch/abc14xyz) which is located on a high availability storage array (protecting against individual hard drive failures), but IS NOT BACKED UP.
The likely uses for this area are: data for driving models, model output prior to analysis. Essentially an area where you should store job output and any data which can be recreated or downloaded again.
There is a default quota set for each user on both HPC home and scratch filesystems. You can view your usage and corresponding quota on each filesystem, updated hourly, by running the quotacheck command:
[cc@login00 ~]$ quotacheck
HPC Storage usage at 17:00 on Feb 17
Filesystem Usage (GB) Quota (GB)
scratch 95.45 100.00
home 25.82 30.00
If your HPC storage requirements are likely to exceed the default quotas, please contact email@example.com and we will be happy to discuss your needs and look at the options available.
There is also the potential for additional storage which is suitable for archiving data and sharing access between a group if required (but not high I/O etc). Contact firstname.lastname@example.org for info.
There is a guide which details uploading and downloading files to and from the cluster can be found under data transfer.