Vector Performance

Steve Lantz with contributions from Aaron Birkland and the Texas Advanced Computing Center
Cornell Center for Advanced Computing and Texas Advanced Computing Center

Revisions: 9/2022, 5/2021, 1/2021, 5/2018, 6/2017, 10/2013 (original)

This section describes several factors that influence real-world performance of vectorized applications. It also suggests tips and techniques that can help you get the best performance from a vectorizable application.

Two basic questions that must be answered when considering vector performance are:

  • How much of my application must be vectorized before I will see a significant benefit?
  • In the vectorized portions, what hinders the instructions from executing as rapidly as possible?
Objectives

After you complete this topic, you should be able to:

  • Explain how Amdahl's Law applies to vectorization
  • Apply Amdahl's Law to predict a code's parallel efficiency e given its vectorizable fraction P or vice versa
  • Describe how clock frequency inhibits parallel speedup for codes that are both multithreaded and vectorized
  • Explain the role of profiling and give examples of the types of profiling that can be done with Intel's Advisor and VTune tools
  • Explain how prefetching helps avoid memory latency
  • Describe the memory access patterns that lead to improved performance for vectorized code
Prerequisites
 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement