Allocation
There are three commands in Slurm that can allocate resources to a job under appropriate conditions: sbatch
, srun
, and salloc
. They all accept the same set of command line options with respect to resource allocation. (Note: in this context, an allocation by Slurm should not be confused with the overall allocation of service units governing your group's usage of system resources over the span of a year or more.)
Stampede2 and Frontera are configured so that the resources that Slurm allocates to a job are whole compute nodes. Whenever a job is submitted to a given partition (queue) on Stampede2 or Frontera, Slurm must be told how many whole compute nodes (-N
) will satisfy the requirements of the job. Once enough nodes become available, the Slurm scheduler allocates the specified number of nodes to the job, provided that this does not delay the start of a job that was submitted earlier to the same queue.
Allocation Parameters
Slurm offers a variety of command-line parameters for srun
, sbatch
, and salloc
that you can use to tailor your job submission. On Stampede2 and Frontera, Slurm always allocates whole nodes within a single partition, and all nodes in a partition are identical. Therefore, simply selecting a particular partition (with -p
) strictly determines the hardware capabilities of the allocated nodes. Hence, the most relevant parameters for job allocations are just nodes (-N
) and tasks (-n
or --tasks-per-node
). The advanced options for requesting allocations with specific hardware features are not needed or supported.
Allocation Methods
sbatch
, srun
, and salloc
differ in the way they allocate and release nodes.
sbatch
generally results in a new resource allocation after it is invoked. Once Slurm allocates nodes to the job, it executes the job script on the first of the allocated nodes (primary node). When the script terminates, Slurm releases the allocation. An additional feature ofsbatch
is that the job script is searched at job submission time for comment lines that begin with#SBATCH
. For each line that is found, the text after#SBATCH
is parsed as a command-line option. Options that are also present on thesbatch
command line take precedence over the#SBATCH
options in the script. In this way, a single batch script can be reused for different sizes of job allocations (e.g.), by specifying-N
and/or-n
options on the command line, in order to override your usual#SBATCH
choices in the job script.srun
may or may not create an allocation, depending on how it is invoked. If it is invoked on the command line of a login node, then it will create a new allocation and execute the command followingsrun
. If it is invoked within a batch script (a use case that is not supported on Stampede2 and Frontera), it will simply run the given command in parallel, using the current allocation. Likewise,srun
may be given a--jobid
argument, in which case it runs the command in parallel on the specified job's allocation. (The command that you supply tosrun
can of course be an executable script.)salloc
works likesrun
, except that it always results in a new resource allocation when it is invoked. The typical use case ofsalloc
is to create an allocation in order to run a series of subsequentsrun
commands, either through an interactive bash session or a script which originates from the login node. Slurm executes the bash session or script and releases the allocation after it terminates. This use case is not supported on Stampede2 and Frontera.