Vector Performance
Steve Lantz with contributions from Aaron Birkland and the Texas Advanced Computing Center
Cornell Center for Advanced Computing and Texas Advanced Computing Center
Revisions: 9/2022, 5/2021, 1/2021, 5/2018, 6/2017, 10/2013 (original)
This section describes several factors that influence real-world performance of vectorized applications. It also suggests tips and techniques that can help you get the best performance from a vectorizable application.
Two basic questions that must be answered when considering vector performance are:
- How much of my application must be vectorized before I will see a significant benefit?
- In the vectorized portions, what hinders the instructions from executing as rapidly as possible?
Objectives
After you complete this topic, you should be able to:
- Explain how Amdahl's Law applies to vectorization
- Apply Amdahl's Law to predict a code's parallel efficiency e given its vectorizable fraction P or vice versa
- Describe how clock frequency inhibits parallel speedup for codes that are both multithreaded and vectorized
- Explain the role of profiling and give examples of the types of profiling that can be done with Intel's Advisor and VTune tools
- Explain how prefetching helps avoid memory latency
- Describe the memory access patterns that lead to improved performance for vectorized code
Prerequisites
- Knowledge of C and/or Fortran, as well as a basic knowledge of what assembly language is
- Familiarity with batch job submission on large compute clusters
©
|
Cornell University
|
Center for Advanced Computing
|
Copyright Statement
|
Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)