Skip to main content


Introduction

SLURM (Simple Linux Utility for Resource Management) is a group of utilities used for managing workloads on compute clusters. On Stampede, all jobs executed on the compute nodes are managed by SLURM. The basic knowledge required to submit jobs to run on Stampede through SLURM are discussed in the Stampede environment module, and in the Stampede user guide.

This module is for users who are already familiar with the process of submitting jobs via SLURM, but whose needs go beyond submitting simple batch files or interactive jobs. We will discuss some of the lesser-known but powerful features of SLURM that can help provide the basis for potential strategies for workflows and advanced techniques such as parameter sweeps. In addition, we will provide comprehensive view of the SLURM commands architecture, as opposed to just explaining their common usage. The goal is to provide practical techniques and a broader understanding of SLURM without spending the time to learn everything about SLURM.

Since SLURM provides a highly extensible infrastructure that can be configured in many ways, this module will focus only on the features of SLURM implemented on Stampede at TACC. For example, SLURM has an MPI plugin that allows SLURM itself to launch MPI tasks via srun. SLURM on Stampede is not configured to use this plugin; TACC provides its own more feature rich MPI launcher ibrun. As a result, topics such as launching MPI tasks directly via srun will not be discussed.

Aaron Birkland
Cornell Center for Advanced Computing

With contributions from:
Texas Advanced Computing Center

April 2014