Introduction
Steve Lantz with contributions from Aaron Birkland and the Texas Advanced Computing Center
Cornell Center for Advanced Computing and Texas Advanced Computing Center
Revisions: 3/2023, 9/2022, 5/2021, 1/2021, 5/2018, 6/2017, 10/2013 (original)
Vectorization is a process by which floating-point computations in scientific code are compiled into special instructions that execute elementary operations (+,-,*, etc.) or functions (exp, cos, etc.) in parallel on fixed-size vector arrays. The ultimate goal of vectorization is an increase in floating-point performance (possibly integer and logical performance as well) through hardware parallelism..
This topic is a general introduction the vectorization process, focusing on what vectorization is and how it increases performance.
Objectives
After you complete this topic, you should be able to:
- Describe the concept of vectorization and the motivation for making use of it in your application
- Explain what vectorization involves from the hardware, compiler, and user perspectives
- Define SIMD, and relate this term to the execution of vector instructions
- Discuss the effect of vector length on speedup
- Give several reasons why ideal speedup may not be realized in application performance as a whole
- Define vector intrinsics
- Describe how a simple loop can be vectorized automatically by a compiler
- Explain how a fused multiply-add instruction improves vector performance
Prerequisites
- Knowledge of C and/or Fortran, as well as a basic knowledge of what assembly language is
- Familiarity with batch job submission on large compute clusters