Managing Jobs
Steve Lantz, Peter Vaillancourt
Cornell Center for Advanced Computing
Revisions: 4/2024, 9/2021, 5/2021, 8/2020 (original)
Frontera is the largest academic supercomputer in the world, located at The University of Texas at Austin's Texas Advanced Computing Center (TACC). Frontera is tailored towards the very largest of scientific computing projects. This portion of the quick-start guide shows you how to use different Slurm commands to track and control the progress of your batch job, and suggests what to try if something goes wrong.
Objectives
After you complete this topic, you should be able to:
- Explain how to monitor a job’s progress while it is running
- Discuss creating job dependencies
- Explain how to assign job attributes and why doing so may be useful
- Name key troubleshooting measures when a job does not run as expected
Prerequisites
Frontera is a leadership-class system, so its prospective users are already likely to have a high degree of familiarity and experience with HPC and parallel computing. The pace of this presentation is meant to be relatively brisk, for that reason.
With that being understood, there are no formal prerequisites for this Virtual Workshop topic. A working knowledge of Linux is recommended; if you need more preparation in Linux, try working through the Linux roadmap first.