Cornell Virtual Workshop > Message Passing Interface (MPI) > MPI on HPC Systems

Running via Slurm

On typical HPC systems, you submit parallel jobs through Slurm, a batch scheduling system for the compute nodes. As a rule, it is not permitted to run MPI programs directly on login nodes. This measure is a protection against unintended consequences that could affect login node performance for other users.

The compute nodes are commonly grouped into different queues (called partitions in Slurm), according to the characteristics that each group shares.

If there is a development partition in Slurm, use it for testing your parallel MPI codes.

The relatively brief time limit on jobs in the development queue is meant to ensure short wait times.

Jobs can be submitted to Slurm using the sbatch command and a job script. The name of the script file appears as the first argument to sbatch. Special comments at the head of the job script provide Slurm with the parameters for the job. The purpose of many of these parameters can be inferred simply by examining the following sample script:

The key command above is the final one, ibrun, which is a TACC-specific front end to the mpirun command that actually initiates the MPI processes on one or more nodes. By default, ibrun will distribute the full number of tasks across the full number of nodes requested via Slurm. On HPC systems other than TACC's, you would simply use mpirun directly (or equivalently, mpiexec) for the same purpose. The entire process of job initiation through Slurm is illustrated in the figure below:

Diagram showing terminal connection to login node via ssh and login node interaction with Slurm manager through sbatch and related commands. The slurm manager takes care of scheduling, managing resource use and distributing work to the compute nodes. — Login node interaction with the Slurm job manager.

Assuming the above batch file is saved in mpi_batch.sh, you would submit it by running the following on a TACC system like Frontera or Vista:

The output and error files from the job are generated in your current directory. Note that the MPI environment in the batch job should match the MPI environment that you used to compile the code. At TACC, this will happen automatically if the correct environment modules are loaded when you submit your batch job.

For more a more in-depth explanation about how to use Slurm for running your MPI jobs on TACC systems, refer to the appropriate user guide, e.g., the Frontera User Guide or the Vista User Guide. Also, the Cornell Virtual Workshop roadmap Getting Started on Frontera offers a complete introduction to using compilers, libraries, and batch jobs on Frontera.

Back

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)