Slurm Job Scheduling System

Common Slurm Commands

Slurm comes with a range of commands for administering, using, and monitoring a Slurm configuration. A number of tutorials detail their use, but to be complete, I will look at a few of the most command commands.


The all-purpose command sinfo lets users discover how Slurm is configured:

$ sinfo -s
p100     up   infinite         4/9/3/16  node[212-213,215-218,220-229]

This example lists the status, time limit, node information, and node list of the p100 partition.


To submit a batch serial job to Slurm, use the sbatch command:

$ sbatch

For batch jobs, sbatchis one of the most important commands, made powerful by its large number of options.


To run parallel jobs, use srun:

$ srun --pty -p test -t 10 --mem 1000 /bin/bash [script or app]

The same command

$ srun --pty -p test -t 10 --mem 1000 /bin/bash

runs an application script interactively.


The scancel command allows you to cancel a specific job; for example,

 $ scancel 999999

cancels job 999999. You can find the ID of your job with the squeue command.


To print a list of jobs in the job queue or for a particular user, use squeue. For example,

$ squeue -u akitzmiller

lists the jobs for a particular user.


The sacct command displays the accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database, and you can run the command against a specific job number:

$ sacct -j 999999


A resource manager is one of the most critical pieces of software in HPC. It allows systems and their resources to be shared efficiently, and it is remarkably flexible, allowing the creation of multiple queues according to resource types or generic resources (e.g., GPUs in this article). Slurm also has job accounting by default.

The Slurm resource manager is one of the most common job schedulers in use today for very good reasons, some of which I covered here. Prepare to be “Slurmed.”