Cornell Virtual Workshop: Vectorization

Roadmap: Vectorization

IntroductionVector HardwareVectorizing with CompilersVector-Aware CodingVector Performance

Vectorization is a process by which mathematical operations found in tight loops in scientific code are executed in parallel on special vector hardware found in CPUs and coprocessors. This roadmap describes the vectorization process as it relates to computing hardware, compilers, and coding practices. Knowing where in code vectorization ought to occur, how vectorization will increase performance, and whether the compiler is vectorizing loops within a code as it should are critical to getting the full potential from the CPUs of modern HPC systems such as Frontera and Stampede3.

Objectives

After you complete this roadmap, you should be able to:

Describe the operation of, and motivation behind, the vector hardware found within the CPUs in Intel Xeon and similar architectures
Write code that is "vectorization-friendly", so that compilers can generate efficient vector instructions from it.
Assess whether compilers are vectorizing code in the places that they should, and provide hints to compilers where they fail to automatically vectorize loops that should be vectorized
Demonstrate rearranging code to improve its performance, based on an understanding of how vectorization relates to parallel performance and to the memory characteristics of processors

Prerequisites

Knowledge of C and/or Fortran, as well as a basic knowledge of what assembly language is
Familiarity with batch job submission on large compute clusters

Requirements

System requirements include:

Access to a C or Fortran compiler
Access to Frontera, Stampede3, or any compute cluster equipped with a vectorizing compiler and the Slurm Workload Manager