Cornell Virtual Workshop > Advanced Slurm > Runtime Environments

Exercise

In this exercise, we will become more familiar with the TACC launcher and use it to launch "jobs" among multiple machines in a Slurm allocation. We will submit more launcher jobs than there are slots for Slurm tasks in the allocated resources, so you will observe the launcher "scheduling" these jobs among the limited resources.

As mentioned earlier, the TACC launcher executes a job list that is enumerated in a text file. Create a file named hostname_sleep containing the following 8 lines. Each command line prints the job ID, the name of the host on which it is running, and a timestamp, then sleeps for 10 seconds.
This job file is always identified by the environment variable LAUNCHER_JOB_FILE, and to use launcher successfully, it essential to assign this variable correctly:
Prepare your environment to run launcher by loading the necessary module. Among other things, this sets the value of the LAUNCHER_DIR environment variable, which will be needed later by the batch script:
TACC provides a sample batch file at $LAUNCHER_DIR/extras/batch-scripts/launcher.slurm. Copy this file to a working directory of your choice and use an editor to comment out several lines as shown. On Stampede2 and Frontera, these lines are unneeded because the batch job will inherit the environment that we have already set up on the login node. Also, the working directory does not have to be set if the job is submitted from that directory, and the account string is superfluous for users who have only one account.
```
##SBATCH -A <------ Account String ----->

#------------------------------------------------------

#module load launcher
#export LAUNCHER_WORKDIR=Your-Working-Directory-Here
#export LAUNCHER_JOB_FILE=helloworld_multi_output

$LAUNCHER_DIR/paramrun
```
We are ready to use sbatch to submit the batch job, which will in turn run the launcher jobs listed in the job file. We'll request two nodes with four total tasks, or two tasks per node. To do this, we'll override a couple of the Slurm options from the command line. (Of course, this and several of the steps above could equally well be accomplished by editing the launcher.slurm script appropriately.)
Once the Slurm job finishes, look at the output file, Parametric*.out. Some of the lines of output were generated by the commands we specified:
Examine the hostnames and execution times in the output file. Can you determine how many "jobs" were running at a given time on a given machine, and the order in which they ran? Does the launcher's assignment of jobs to tasks match your expectations?
Try different submission parameters: one node with 8 tasks, or 4 nodes with 1 task each. In each case, compare the launcher's patterns of execution with your expectations.

Back