launcher (at TACC)
TACC provides a launcher utility that is able to launch independent "jobs" from a list in a text file, across specified resources, until the list is exhausted. Resources may span multiple nodes in a network. Each line in the file contains one or more commands to be executed; the executables in the list can be serial or threaded applications. When a given line is reached, any variable substitutions are applied, then that full line is run as it appears.
In the above respects, launcher works very much like GNU's parallel
utility. The main difference is that launcher is more integrated with Slurm. Just like srun
, TACC's launcher uses the values of -N
and -n
to determine how and where to launch its "jobs". But unlike srun
, the total number of jobs/tasks in the workpile can exceed -n
, and each job can involve different executables, if desired. (Note that the launcher uses the term "jobs" in a way that is different from the way we have been using it with respect to Slurm on other pages.)
Using the TACC launcher involves three steps:
- Load the launcher module via
module load launcher
. - Prepare an arbitrarily long list of lines (jobs) to be run, saved as a text file.
- Submit a batch script containing at least one invocation of the top-level launcher command,
paramrun
. Specify the desired allocation in the arguments tosbatch
(-N
,-n
,-t
, etc.).
When the batch script starts, TACC's launcher will examine the sbatch
parameters and launch "jobs" on each allocated node in accordance with the specification (-N
, -n
). If the number of jobs in the input file is greater than -n
(or equivalently, -N
times --tasks-per-node
), the launcher will keep launching jobs in place of the ones that finish, so that the number of active jobs is always as close as possible to -n
. The Slurm job will end once the launcher is done executing all its jobs.
The TACC launcher will make a number of environment variables available to each job that it runs. Certain of these variables are described below (again using "job" in the same way that the launcher does).
LAUNCHER_NPROCS
Contains the maximum allowable number of running jobs. This is equivalent to the-n
Slurm parameter.LAUNCHER_TSK_ID
Contains a slot number within the set of currently executing jobs. It corresponds to a Slurm task ID. Values are from 0 to$LAUNCHER_NPROCS-1
.LAUNCHER_JID
Contains a job number relative to the entire set of jobs defined in the input file. Each job (line) in the input file is assigned a number starting at 1 and increasing.
To learn more about launcher and how to use it, visit its GitHub page (as is recommended by module help launcher
on Stampede2 or Frontera), or its software page at the TACC User Portal. Examples of job files and Slurm batch scripts are found in the $LAUNCHER_DIR/extras
directory. You can use them as your starting point. There is also a launcher-gpu for use on Frontera.
Note that launcher is not the only option available to you on Stampede2 and Frontera. Very similar functionality is provided by module load gnuparallel
, which brings in GNU parallel
, or by module load pylauncher
, which is geared toward Python 3 programmers. For more information on these packages, use module help
(as well as man parallel
).