Ways to Run Jobs
To succeed in using a major HPC resource like Frontera, it is essential to have a workable plan for parallelizing computations. There is no one right way to do this; in fact, it may be best to rely on a combination of techniques. As you might expect, Frontera comes with an array of tools to facilitate the various styles of parallel processing. The main methods that are available are summarized in the table below.
Program type | # of nodes | How to run in parallel on Frontera |
---|---|---|
Multithreaded program (OpenMP, TBB) |
Single node | Set number of threads and run program |
High throughput computing with serial or multithreaded code |
1+ nodes | Use launcher, launcher_gpu, pylauncher, or gnuparallel (the parallel command) |
MPI program | 1+ nodes | Start program with ibrun |
Hybrid of the above | 1+ nodes | Use any or all methods in combination |
You'll first want to ensure that your program can make good use of the available cores and memory on a single compute node. Just by itself, one node can accommodate up to 56 OpenMP threads, or MPI tasks, or independent serial processes. To go beyond one node, it is necessary to use MPI, or to have some means of launching independent processes that run in parallel on multiple nodes.
You will almost certainly need MPI to run an application at scale on Frontera. TACC systems feature a special MPI starter called ibrun
that streamlines the process for users. Among other benefits, ibrun
works with Slurm's batch environment to produce suitable hostlists for jobs. It also provides a uniform interface for different MPI stacks. Otherwise, you would need to remember the varied run commands you see here:
- Intel MPI: mpiexec.hydra
- MVAPICH2: mpirun_rsh
- OpenMPI: mpirun (not yet available on Frontera)
Later, we'll look at an example batch script showing ibrun
usage.
In addition, TACC systems provide several non-MPI launchers for high-throughput-computing, in which many independent workers grab tasks from a workpile until all tasks are done. A parameter sweep is the prototypical example of this type of computing. Frontera's launcher utilities are not covered in depth here; to learn about any of them, consult module help
, for example:
$ module help launcher
This will print some clues about usage. After loading one of the modules, you can also find relevant documentation in the directory $TACC_<modulename>_DIR
(or a subdirectory), where <modulename>
is the name of the module (in all caps): either launcher, launcher_gpu, pylauncher, or gnuparallel. None of these utilities comes with man pages; however, for gnuparallel (only) you can run either man parallel
or parallel --help
for more information..