Execution: srun
srun
is a means of synchronously submitting a single command to run in parallel on a new or existing allocation. It is inherently synchronous because it attempts to launch tasks on an allocated resource, waits (blocks) until these resources are available, and returns only when the tasks have completed. On Stampede2 and Frontera, the most common use of srun
is to start an interactive session on one or more compute nodes, as in:
srun -n 1 -N 1 -p development -t 00:05:00 --pty /bin/bash -l
The sequence of events with srun
is as follows:
- The user provides a command for execution on an allocation.
- The provided command may include command line arguments. In the above example, the provided command is
/bin/bash
and the argument is-l
. The command could equally well be a user-supplied script. - It will be executed in parallel exactly as specified through the options to
srun
, on one or more compute nodes in an allocation.
- The provided command may include command line arguments. In the above example, the provided command is
- If an existing allocation has been specified, Slurm
starts executing the command right away, in parallel. Otherwise, Slurm blocks execution until a new allocation is established.
- For
srun
to run in an existing allocation, a previously submitted job must be identified through the--jobid
option or through theSLURM_JOB_ID
environment variable. The other options tosrun
must be compatible with the specified job (i.e., the partition is the same, the number of tasks fits within the allocation, etc.).
- For
-n
identical copies of the command (or-N
times--tasks_per_node
copies) are run simultaneously on the allocated resources as individual tasks.- In general, the aggregated stdout and the aggregated stderr for all the tasks are redirected to
srun
's own stdout and stderr, respectively. - If the
--pty
argument is present, pseudo-terminal mode is activated, which redirects input and output from only the first task back to the originating shell. This is the key to starting an interactive session:srun
launchessrun
-n shells (e.g.,/bin/bash
) in parallel, just as it would for any other provided command; however, only the first shell is given the--pty
connection, while the other shells are connected to/dev/null
, causing them to terminate immediately. - Besides the
-N
and-n
command-line options, the distribution of tasks is controlled by options such as-w
which can be used to enumerate the names of the nodes within the allocation on which to run the tasks. - Because Slurm does not use the MPI plugin on Stampede2 and Frontera, the tasks launched by
srun
are simply n identical copies of the same command and are not launched in an MPI environment. As will be seen in a later section, this way of launching tasks can be used to implement parameter sweeps using serial or threaded code.
- In general, the aggregated stdout and the aggregated stderr for all the tasks are redirected to
- When all tasks terminate,
srun
exits. Ifsrun
was used to create the allocation, then the allocation is released and the job ends; otherwise, the allocation remains.