Strategies
Now that we know the performance benefits from vectorization, how do we get our compiler to produce vector instructions from code? There are several strategies that can be employed:
- Program in assembly language
- If you are familiar with vector instruction sets, it is possible to write sections of a program in assembly language, directly issuing vector instructions as needed. Typically, this is not practical except in the most extreme situations when trying to extract every last bit of performance from a CPU.
- Program in vector intrinsics
- Vector intrinsics are a set of functions available to high-level languages such as C that correspond directly to vector instructions. Like assembly, they provide direct control over the vector instructions issued by code, but may be more easily mixed in with C/C++ code. Also like assembly, in many cases it is not practical to program extensively with intrinsics unless extreme performance is the goal.
- Let the compiler vectorize automatically
- This is a very practical option, as many compilers are quite good at automatically vectorizing loops in C/C++ and especially Fortran as an optimization step. However, compilers can only vectorize loops that meet the right criteria; in particular, such loops must be verified not to contain data dependencies. This makes automatic vectorization a challenge for the compiler, and compilers are imperfect. They may not be able to vectorize everything possible—most often due to a potential data dependency (more on this later).
- Link to an optimized library that is already vectorized
- This is also a practical option, which allows an application to get the benefit of highly optimized and vectorized code "for free".
This topic focuses on how to enable automatic vectorization through a compiler's built-in optimization capabilities, because compiling applications is a task that is commonly encountered in scientific computing.