line drawing of 4 boxes arranging into a square formation

There are three commands in Slurm that can allocate resources to a job under appropriate conditions: sbatch, srun, and salloc. They all accept the same set of command line options with respect to resource allocation. (Note: in this context, an allocation by Slurm should not be confused with the overall allocation of service units governing your group's usage of system resources over the span of a year or more.)

Stampede2 and Frontera are configured so that the resources that Slurm allocates to a job are whole compute nodes. Whenever a job is submitted to a given partition (queue) on Stampede2 or Frontera, Slurm must be told how many whole compute nodes (-N) will satisfy the requirements of the job. Once enough nodes become available, the Slurm scheduler allocates the specified number of nodes to the job, provided that this does not delay the start of a job that was submitted earlier to the same queue.

Allocation Parameters

Slurm offers a variety of command-line parameters for srun, sbatch, and salloc that you can use to tailor your job submission. On Stampede2 and Frontera, Slurm always allocates whole nodes within a single partition, and all nodes in a partition are identical. Therefore, simply selecting a particular partition (with -p) strictly determines the hardware capabilities of the allocated nodes. Hence, the most relevant parameters for job allocations are just nodes (-N) and tasks (-n or --tasks-per-node). The advanced options for requesting allocations with specific hardware features are not needed or supported.

Allocation Methods

sbatch, srun, and salloc differ in the way they allocate and release nodes.

  • sbatch generally results in a new resource allocation after it is invoked. Once Slurm allocates nodes to the job, it executes the job script on the first of the allocated nodes (primary node). When the script terminates, Slurm releases the allocation. An additional feature of sbatch is that the job script is searched at job submission time for comment lines that begin with #SBATCH. For each line that is found, the text after #SBATCH is parsed as a command-line option. Options that are also present on the sbatch command line take precedence over the #SBATCH options in the script. In this way, a single batch script can be reused for different sizes of job allocations (e.g.), by specifying -N and/or -n options on the command line, in order to override your usual #SBATCH choices in the job script.
  • srun may or may not create an allocation, depending on how it is invoked. If it is invoked on the command line of a login node, then it will create a new allocation and execute the command following srun. If it is invoked within a batch script (a use case that is not supported on Stampede2 and Frontera), it will simply run the given command in parallel, using the current allocation. Likewise, srun may be given a --jobid argument, in which case it runs the command in parallel on the specified job's allocation. (The command that you supply to srun can of course be an executable script.)
  • salloc works like srun, except that it always results in a new resource allocation when it is invoked. The typical use case of salloc is to create an allocation in order to run a series of subsequent srun commands, either through an interactive bash session or a script which originates from the login node. Slurm executes the bash session or script and releases the allocation after it terminates. This use case is not supported on Stampede2 and Frontera.
 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement