CSC Computing
CSC Compute Cluster
The CSC network has an experimental cluster set-up; this is described below. It uses SLURM to coordinate a job queue to ensure fair usage of compute resources. For SLURM documentation see SLURM Quick-Start Guide or SLURM overview.
Login-node
The head-node for the SLURM Cluster is athena which you can ssh into. However, this only has 6 cores (hyper-threaded) so is of little use for long-running jobs.
Compute nodes
There are currently 5 CPU compute nodes; each of these has 248GB RAM and 48 cores.
Compute nodes for csc-mphil partition:
- phy-cerberus4
- phy-cerberus5
- phy-cerberus6
Compute nodes for lsc partition:
- phy-cerberus7
- phy-cerberus8
Additionally 2 GPU compute nodes; each of these has 2 NVIDIA RTX5090 GPUs with 32GB RAM each. They also have 32 CPU cores and 128GB RAM.
Compute nodes for csc-mphil-gpu partition:
- phy-thetis
- phy-damysus
SLURM scripting
A sample SLURM script for CPU jobs can be found at /lsc/opt/slurm/slurm_lsc.sh:
#!/usr/bin/bash #SBATCH --partition=lsc #SBATCH --time=00:02:00 #SBATCH --output=hostname-%A.out #SBATCH --mem=1GB #SBATCH --ntasks=8 #SBATCH --mail-type=ALL #SBATCH --account=pmb39 #SBATCH --clusters=CSC hostname date echo "SLURM Job ID = $SLURM_JOB_ID" echo "Procs = $SLURM_JOB_CPUS_PER_NODE" echo "Submission host = $SLURM_SUBMIT_HOST" echo "Submission dir = $SLURM_SUBMIT_DIR" sleep 10 date
To schedule this to run:
pmb39@athena $ sbatch /etc/default/slurm/slurm_lsc.sh
You should receive emails when the job starts and ends (due to the --mail-type=ALL line), and the output will go into the file hostname-N.out in your current directory where N is the job ID.
To start an interactive session:
pmb39@athena $ srun --time=10:00 --partition=compute --clusters=csc --pty /bin/bash
For GPU jobs, you need to add the --gpus=N option. See /lsc/opt/slurm/slurm_gpu.sh which demonstrates how to check the number of GPUs you have access to.
#!/usr/bin/bash
#SBATCH --partition=csc-mphil-gpu
#SBATCH --time=00:02:00
#SBATCH --output=hostname-%A.out
#SBATCH --mem=1GB
#SBATCH --ntasks=1
#SBATCH --gpus=2
##SBATCH --mail-type=ALL
#SBATCH --account=pmb39
#SBATCH --clusters=CSC
echo "HELLO FROM CSC-MPHIL-GPU PARTITION!"
hostname
date
# Detect GPUs on system;
# will always output details for all GPUs on system
nvidia-smi
# Job information
echo "SLURM Job ID = $SLURM_JOB_ID"
echo "Procs = $SLURM_JOB_CPUS_PER_NODE"
echo "Submission host = $SLURM_SUBMIT_HOST"
echo "Submission dir = $SLURM_SUBMIT_DIR"
# Test that CUDA is functional: Only connects to one GPU
/lsc/opt/cuda-12.9/extras/demo_suite/bandwidthTest
# Use API to check for devices:
# Will detect either 1 or 2 GPUs depending on --gpus=N line above
/lsc/opt/cuda-12.9/extras/demo_suite/deviceQuery
sleep 20
date
Limits
These limits are experimental; they will be subject to tuning based on usage.
- LSC Cluster: max cores: 48; max run-time: 36 hours
- CSC Cluster: max cores: 48; max run-time: 6 hours
- CSC-GPU Cluster: max GPUs: 2; max run-time: 6 hours
Nodes are non-exclusive, so different users' jobs may share nodes, up to four separate jobs per node. If you need exclusive access to a node, specify a number of cores large enough, i.e. --ntasks 48 or --mem 248GB.
File-systems
Your home directory /home/raid/pmb39 (for example) is shared between all head and compute nodes as for other machines on the CSC network. Also, /data/athena/ is visible on all compute nodes.
For other output, each node has its own local disk. These are accessible to all machines on the CSC network via their name: /data/phy-cerberus4 So, in general you should pick a node to output to and output to that specific /data partition.
This is not an ideal set-up, as outputing from phy-cerberus4 to a disk on phy-cerberus5 is not efficient. If performance is a problem, let pmb39 know.
