Laboratory for Scientific Computing

Scratch data storage

Scratch/data storage

Your home directories are somewhat limited in size, and therefore cannot store the many GB of data you will generate from your simulations.

So, for storing large amounts of simulation data, or other easily regeneratable data, each server and desktop has a variable amount of "scratch-space", found under /local/data/public. You should create a subdirectory with your username inside this if you wish to store any data there.

You can access the scratch space of other computers in the group from /data/<machine-name> from any other machine. This is automagically mounted when you request it, so don't be surprised that /data/ does not contain all machine names. See the list of machines for full details of available data disks.

Note that files on the scratch-space are not backed up, and therefore should only be used for data that can easily be recreated, such as simulation output. Also, the scratch-space is stored on the local computer, so that disk-access is substantially faster than to home-directories. It is usually therefore quicker to output data to scratch-space than to your home-directory, and not prone to network failures, so that long-running simulations can continue in the event of a network problem.

To determine the amount of scratch space available, use: df -h and look for either where / or /local/data is mounted.

Note that some machines have more than one data-disk, with the extra ones being labelled /local/data2 locally, and /data/hydra01-2 (for example) globally. A full list can be found if you cat /etc/auto.data.lsc or cat /eta/auto.data.csc-mphil on any machine, which shows the folder name in the first column.

Some machines also have a /scratch partition, which is usually on the same disk as the main OS, but happened to be spare space when it was installed. This is not backed up either, and is more vulnerable than the data space if the OS needs to be reinstalled. This space is therefore not made globally available.

Consideration for other users

Please consider other users when using /data space. As it is far quicker to output to the same machine that a simulation is running on, it is helpful if there is disk-space available when and where it is needed. You should therefore try to reduce the amount of data output as far as possible, or move it out of the way, or delete it as soon as possible.

In order to see the /data disk-usage on the current machine, run quota-local, which will list, for each user, the amount of disk space being used. If it becomes necessary, you may wish to encourage the users of the largest amounts of disk space to move their data elsewhere.

You can check how much space you are using on a per-directory basis by typing du -hc -d 1 ./ in any directory.

Hints on reducing disk usage

  1. Tidy up your files. Delete files that are no longer needed. Perhaps delete every other output file of a series if you have generated a lot. For example: rm "*[02468].hdf" will delete all HDF files ending in an even number.
  2. It will help if you transfer files to disks of machines that are not generally used for computation. Output during a simulation is often slow to computers other than the one you are running on, but it is much quicker to move files around once you have generated them.

For users of AMReX

  1. Delete checkpoint files you no longer need.
  2. You can use VisIt in client-server mode to access files that are on remote machines without having to transfer them to your local machine. See Visualisation for details.