HSM involves the file server automatically moving data from disk to tape and back again as required. It enables us to provide a cheaper and larger tier of storage space for data that isn't in regular daily use. An archive area is available where data files that aren't regularly used can be put, then after a certain period of time (dependant on a number of factors) the data files are written to tape, leaving a 'stub'. When the files are needed again, copying this stub back to your home directory or scratch space will retrieve the file from the tape repository without the need to use a third party backup or archiving tool. At the moment we cannot predict exact retrieval times. In testing the copy process took a ~10 seconds to move a 200MB file back to local disk. The time to retrieve is obviously dependent on the size of the files being recalled but response time will obviously be longer as the system becomes more heavily used. Bringing back +100GB will obviously take several hours - a delay that should be factored into your work.
When using the archive area, please remember the following:
- Data should only be put in here that is not in regular use - YOU SHOULD NOT WORK DIRECTLY FROM THIS AREA - when you wish to work with archived data, please ensure you move files back to home or scratch
- It is much more efficient to put lots of small files into a tarball and then put that into the archive area, rather than moving lots of small files directly into the archive. For example, if you have a directory with hundreds of 10MB files in it would be better to use the tar tool to create a single tar file which contains all the files
- There will be some delay when retrieving files from tape archive, so please plan ahead when working with data that has been archived
- You may find that files you have put into archive do not get pushed to tape (i.e. they still take up space on the esarchive disk, see step 3 in the example below). The archive process is only carried out under certain conditions, so it could be hours or days before data is actually written to tape
- Your archived data is secure - before it is archived to an archive tape it is backed up to two separate tapes (onsite and offsite copies), then there are two archive tapes made to protect against tape failure
- If you delete stub files out of your archive space you are effectively deleting the data in the archive. The file server will reconcile the list of files on disk with those in the archive and purge any omissions