Skip to content

SLURM

Slurm (Simple Linux Utility for Resource Management) is an open-source workload manager and job scheduler designed for high-performance computing clusters. It is widely used in research, academia, and industry to efficiently manage and allocate computing resources such as CPUs, GPUs, memory, and storage for running various types of jobs and tasks. Slurm helps optimize resource utilization, minimizes job conflicts, and provides a flexible framework for distributing workloads across a cluster of machines. It offers features like job prioritization, fair sharing of resources, job dependencies, and real-time monitoring, making it an essential tool for orchestrating complex computational workflows in diverse fields.

Availability

Software Module Load Command
slurm module load wulver

Please note that the module wulver is already loaded when a user logs in to the cluster. If you use module purge command, make sure to use module load wulver in the slurm script to load SLURM.

Application Information, Documentation

The documentation of SLURM is available at SLURM manual.

Managing and Monitoring Jobs

SLURM has numerous tools for monitoring jobs. Below are a few to get started. More documentation is available on the SLURM website.

The most common commands are:

  • List all current jobs: squeue
  • Job deletion: scancel [job_id]
  • Run a job: sbatch [submit script]
  • Run a command: srun <slurm options> <command name>

SLURM User Commands

Task Command
Job submission: sbatch [script_file]
Job deletion: scancel [job_id]
Job status by job: squeue [job_id]
Job status by user: squeue -u [user_name]
Job hold: scontrol hold [job_id]
Job release: scontrol release [job_id]
List enqueued jobs: squeue
List nodes: sinfo -N OR scontrol show nodes
Cluster status: sinfo

For more details, check Running Jobs.