Vector-Aware Coding
Steve Lantz
Cornell Center for Advanced Computing
Revisions: 9/2022, 5/2021, 1/2021, 5/2018, 6/2017, 10/2013 (original)
Acknowledgments: Contributions from Aaron Birkland and the Texas Advanced Computing Center
Seemingly, the automatic vectorization capability of compilers means that nothing special needs to be done to vectorize code other than making sure that vectorization is enabled when compiling. However, it is important to remember that compilers are not perfect. While the preceding topic described a few specific techniques that can be used to control how compilers vectorize, this section describes some of the challenges that compilers face with regard to automatic vectorization, and ways to "help them" to recognize opportunities.
"Vector-aware coding" is a term that may be used to describe the combined process of writing code and manipulating the compiler so that the compiler can optimally vectorize the code it is given. The developer needs to:
- Have a sense of which sections of code ("hot spots") should benefit from vectorization
- Write code in a manner that makes it possible for the compiler to vectorize loops
- Be aware of, and avoid, data dependencies that prevent vectorization
- Verify that the compiler actually vectorized code where expected
- Implement fixes or compiler hints for those areas where the compiler failed to vectorize
Vector-aware coding can help assure that the impressive vector hardware on HPC systems such as Stampede2 is effectively utilized.
Objectives
After you complete this topic, you should be able to:
- Describe what is meant by "vector-aware coding"
- Identify the properties of loops that make them more likely to be auto-vectorized by the compiler
- Explain how data dependency prevents vectorization
- Define "pointer aliasing" and explain why it inhibits vectorization
- Define the term "pragma"
- Explain the purpose of the ivdep, simd, and related pragmas
Prerequisites
- Knowledge of C and/or Fortran, as well as a basic knowledge of what assembly language is
- Familiarity with batch job submission on large compute clusters