SLURM
Slurm (Simple Linux Utility for Resource Management) is an open-source workload manager and job scheduler designed for high-performance computing clusters. It is widely used in research, academia, and industry to efficiently manage and allocate computing resources such as CPUs, GPUs, memory, and storage for running various types of jobs and tasks. Slurm helps optimize resource utilization, minimizes job conflicts, and provides a flexible framework for distributing workloads across a cluster of machines. It offers features like job prioritization, fair sharing of resources, job dependencies, and real-time monitoring, making it an essential tool for orchestrating complex computational workflows in diverse fields.
Availability¶
Software | Module Load Command |
---|---|
slurm | module load wulver |
Please note that the module wulver
is already loaded when a user logs in to the cluster. If you use module purge
command, make sure to use module load wulver
in the slurm script to load SLURM.
Application Information, Documentation¶
The documentation of SLURM is available at SLURM manual.
Managing and Monitoring Jobs¶
SLURM has numerous tools for monitoring jobs. Below are a few to get started. More documentation is available on the SLURM website.
The most common commands are:
- List all current jobs:
squeue
- Job deletion:
scancel [job_id]
- Run a job:
sbatch [submit script]
- Run a command:
srun <slurm options> <command name>
SLURM User Commands¶
Task | Command |
---|---|
Job submission: | sbatch [script_file] |
Job deletion: | scancel [job_id] |
Job status by job: | squeue [job_id] |
Job status by user: | squeue -u [user_name] |
Job hold: | scontrol hold [job_id] |
Job release: | scontrol release [job_id] |
List enqueued jobs: | squeue |
List nodes: | sinfo -N OR scontrol show nodes |
Cluster status: | sinfo |
For more details, check Running Jobs.