Lead Image © Vladislav Kochelaevs, fotolia.com

Lead Image © Vladislav Kochelaevs, fotolia.com

Resource Management with Slurm

Slurm Job Scheduling System

Article from ADMIN 48/2018
By
One way to share HPC systems among several users is to use a software tool called a resource manager. Slurm, probably the most common job scheduler in use today, is open source, scalable, and easy to install and customize.

In previous articles, I examined some fundamental tools for HPC systems, including pdsh [1] (parallel shells), Lmod environment modules [2], and shared storage with NFS and SSHFS [3]. One remaining, virtually indispensable tool is a job scheduler.

One of the most critical pieces of software on a shared cluster is the resource manager, commonly called a job scheduler, which allows users to share the system in a very efficient and cost-effective way. The idea is fairly simple: Users write small scripts, commonly called "jobs," that define what they want to run and the required resources, which they then submit to the resource manager. When the resources are available, the resource manager executes the job script on behalf of the user. Typically this approach is for batch jobs (i.e., jobs that are not interactive), but it can also be used for interactive jobs, for which the resource manager gives you a shell prompt to the node that is running your job.

Some resource managers are commercially supported and some are open source, either with or without a support option. The list of candidates is fairly long, but the one I talk about in this article is Slurm [4].

Slurm

Slurm has been around for a while. I remember using it at Linux Networx in the early 2000s. Over the years, it has been developed by Lawrence Livermore National Laboratory, SchedMD [5], Linux Networx, Hewlett-Packard, and Groupe Bull [6]. According to the website, Slurm provides three functions [7]:

  • "… it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work."
  • "… it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes."
  • "… it arbitrates contention for resources by managing a queue of pending work."

These three points are the classic functions of a resource manager (job scheduler), and Slurm does them well.

Slurm is very extensible, with more than 100 optional plugins to cover everything from accounting, to various job reservation approaches, to backfill scheduling, to topology-aware resource selection, to job arrays, to resource limits by user or bank account and other job priority tools. It can even schedule resources and jobs according to the energy consumption of the job itself.

Architecture

The Slurm architecture is very similar to other job schedulers. Each node in the cluster has a daemon running, which in this case is named slurmd. The resources are referred to as nodes. The daemons can communicate in a hierarchical fashion that accommodates fault tolerance. On the Slurm master node, the daemon is slurmctld, which also has failover capability.

The compute resources (nodes) can be divided into partitions that can overlap, allowing partitions to spill over into other partitions according to resource needs. Partitions can be considered job queues that have certain boundaries, such as limits on job size and jog time, which users can use the partition, and so on.

Installing Slurm

The Slurm community builds Ubuntu binaries for download. For other distributions, you will probably have to build the binary yourself, which is not that difficult, although you will need a few dependencies. A good example of installing Slurm binaries on Ubuntu 16.04 is discussed on GitHub [8], and it even has very useful example configuration files for building a Slurm master (controller) node and one compute (client) node.

The following tips for building and installing Slurm are generally independent of the distribution used.

  1. Synchronize clocks across the cluster.
  2. Make sure passwordless SSH is working between the control node and all compute nodes, and make sure to do this as a user and not as root.
  3. To make life easier, use shared storage between the controller and the compute nodes.
  4. Make sure the UIDs and GIDs are consistent throughput the cluster.
  5. The general installation flow on the control node is:
  • Install the dependencies.
  • Install MUNGE [9], which is an authentication service for creating and validating credentials. Make sure all nodes in your cluster have the same munge.key and the MUNGE daemon, munged, is running before you start the Slurm daemons.
  • Install MariaDB (it is a good to have a database) and start the daemon:
systemctl enable mysql
systemctl start mysql
  • Build and install Slurm.
  • Start the Slurm daemons (e.g., run the following commands as root):
systemctl enable slurmctld
systemctl enable slurmdbd (enable the database)
systemctl enable slurmd (compute node)
  • Create the initial Slurm cluster, account, and user (performed by root):
sacctmgr add cluster compute-cluster
sacctmgr add account compute-account description="Compute accounts" Organization=OurOrg
sacctmgr create user myuser account=compute-account adminlevel=None
  1. Install Slurm on the compute nodes.
  • Install/test MUNGE on the compute node:
systemctl enable munge
systemctl restart munge
  • Install Slurm.
  1. Set up cgroups (if needed).
  2. Optional: Enable Slurm PAM SSH control.

    Installation might look difficult, but it's not. Notice you install and enable Slurm on the master node (control node) and the compute nodes in the first part.

If you don't want to build and install Slurm on every compute node, you can build RPMs for distributions that use that format, or you can use the Ubuntu files. Slurm is popular enough that you might be able to find RPMs built for the distribution you use.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Resource Management with Slurm

    One way to share HPC systems among several users is to use a software tool called a resource manager. Slurm, probably the most common job scheduler in use today, is open source, scalable, and easy to install and customize.

comments powered by Disqus