Execution: srun
srun is a means of synchronously submitting a single command to run in parallel on a new or existing allocation. It is inherently synchronous because it attempts to launch tasks on an allocated resource, waits (blocks) until these resources are available, and returns only when the tasks have completed. On Frontera and Stampede3, the most common use of srun is to start an interactive session on one or more compute nodes, as in:
The sequence of events with srun is as follows:
- The user provides a command for execution on an allocation.
- The provided command may include command line arguments. In the above example, the provided command is
/bin/bashand the argument is-l. The command could equally well be a user-supplied script. - It will be executed in parallel exactly as specified through the options to
srun, on one or more compute nodes in an allocation.
- The provided command may include command line arguments. In the above example, the provided command is
- If an existing allocation has been specified, Slurm
starts executing the command right away, in parallel. Otherwise, Slurm blocks execution until a new allocation is established.
- For
srunto run in an existing allocation, a previously submitted job must be identified through the--jobidoption or through theSLURM_JOB_IDenvironment variable. The other options tosrunmust be compatible with the specified job (i.e., the partition is the same, the number of tasks fits within the allocation, etc.).
- For
-nidentical copies of the command (or-Ntimes--tasks_per_nodecopies) are run simultaneously on the allocated resources as individual tasks.- In general, the aggregated stdout and the aggregated stderr for all the tasks are redirected to
srun's own stdout and stderr, respectively. - If the
--ptyargument is present, pseudo-terminal mode is activated, which redirects input and output from only the first task back to the originating shell. This is the key to starting an interactive session:srunlaunchessrun-n shells (e.g.,/bin/bash) in parallel, just as it would for any other provided command; however, only the first shell is given the--ptyconnection, while the other shells are connected to/dev/null, causing them to terminate immediately. - Besides the
-Nand-ncommand-line options, the distribution of tasks is controlled by options such as-wwhich can be used to enumerate the names of the nodes within the allocation on which to run the tasks. - Because Slurm does not use the MPI plugin on Frontera and Stampede3, the tasks launched by
srunare simply n identical copies of the same command and are not launched in an MPI environment. As will be seen in a later section, this way of launching tasks can be used to implement parameter sweeps using serial or threaded code.
- In general, the aggregated stdout and the aggregated stderr for all the tasks are redirected to
- When all tasks terminate,
srunexits. Ifsrunwas used to create the allocation, then the allocation is released and the job ends; otherwise, the allocation remains.
©
|
Cornell University
|
Center for Advanced Computing
|
Copyright Statement
|
Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)