Tools for Tuning
Chris Myers, Steve Lantz
Cornell Center for Advanced Computing
Revisions: 4/2024, 1/2022, 2/2021 (original)
Advanced clusters like Stampede3 and Frontera offer multiple dimensions along which codes can achieve high parallel performance—for example, multithreading plus vectorization. Let's assume that you have already put some effort into compiling and running your application in a manner that leverages these capabilities, based on the information in the present roadmap (and perhaps some of the advanced roadmaps of the Cornell Virtual Workshop as well). What else can you do to help your application achieve its best performance on the clusters built from Xeon Scalable Processors? In this topic, we discuss some of the available diagnostic tools, focusing on those that Intel provides for this purpose.
Objectives
After you complete this topic, you should be able to:
- Identify tools that help identify trouble spots in code
- Describe the contents of an optimization report
- Explain how Intel Advisor can improve applications
- Describe the kinds of data that can be collected and displayed by the Intel VTune Profiler
Prerequisites
- Familiarity with High Performance Computing (HPC) concepts. Those who are less conversant with HPC terms and techniques should be prepared to inspect the glossary terms rather frequently. It may also be helpful to review Cornell Virtual Workshop content on Parallel Programming Concepts and High-Performance Computing and either MPI or OpenMP.
- Programming experience in C or Fortran. Introductions to C and Fortran are available, though the reader will need to look elsewhere for a full tutorial on these languages.
- Readers who need an introduction to either Stampede3 or Frontera will find it helpful to first review one of more of the following items: the Stampede3 User Guide, the Frontera User Guide, and the Getting Started on Frontera CVW material.