Profiling
Once appropriate compiler flags are specified to take full advantage of the KNL 512-bit vector instruction set, a number of useful profiling tools and procedures are available.
Key points:
-
To take advantage of the KNL 512-bit vector instruction set in conjunction with the Intel compilers, the following set of
compiler flags should be specified in addition to whatever other arguments and flags are required to compile code for particular tasks:
COMPFLAGS = -03 -xMIC-AVX512 -fma -align array64byte -finline-functions -ip -ipo
-
This topic addresses a variety of methods and tools for profiling and other kinds of code analysis. These analyses reveal different aspects of
code performance and potential processing bottlenecks. Some analyses are generic across platforms (e.g., scaling studies),
whereas others are provided by specific Intel tools.
- Hotspot identification: which routines are consuming the bulk of the processing time.
- Roofline plots: plots of performance vs. arithmetic intensity to identify specific performance bounds and bottlenecks.
- Intra-node scaling study: scaling of run time with number of cores within a single KNL node.
- Inter-node scaling study: scaling of run time with number of cores across multiple KNL nodes.
- Application Performance Snapshot: Intel tool providing broad visual overview of different aspects of code performance.
- Memory Access Analysis: Intel tool indicating memory bounds and cache misses in key routines.
- Vectorization Advisor: Intel tool indicating vectorization gains and efficiencies.
- Loop Analytics: Intel tool providing summary of performance in various loops.
- Vectorization Report: Intel compiler report suggesting possible compiler options and code directives to improve vectorization efficiencies.