Cornell Virtual Workshop > Vectorization > Vector Hardware

Instruction Sets

As already mentioned, vector operands and operations are specified in terms of vector instruction sets. The instruction set specifies the types (and lengths) of vectors that may be utilized in compiled code, as well as the kinds of operations that should be supported on the CPUs. In order to keep track of the vector capabilities in successive generations of Intel processors, vector instruction sets are given unique names: MMX, SSE, SSE4, AVX, and so on.

Over time, there has been growth in both vector register sizes and the number of available vector operations in x86 and x86_64 processors. The figure on the Registers page shows the historical progression of Intel's instruction sets and the associated vector register sizes.

Intel processors are generally backwards compatible, meaning that a new processor model will support preceding instruction sets. However, this is only partly true of the models present on Stampede2, i.e., KNL, SKX, and ICX. While these processors are all based on the AVX-512 instruction set, the CPUs in them don't all support the same subsets of AVX-512. Nevertheless, they do support instructions for 256- and 128-bit wide vector registers from earlier models. (Interestingly, these older vector instructions can only be processed by 1 of the 2 VPUs in KNL.)

Incidentally, the former Xeon Phi coprocessor, known by the code name KNC, is a total exception to the rule: it has its own custom instruction set that is distinct from the host Xeon CPUs. By contrast, the newer Xeon Phi on Stampede2, KNL, is fully compatible with its x86 forebears.

The AVX-512 vector instruction set on the current Stampede2 processors—which was foreshadowed by the KNC coprocessors on the original Stampede—is particularly rich. For example, it contains fused multiply-add (FMA) instructions that perform operations of the form a + (b × c) → a in one cycle; plus it has scatter/gather, permute, and shuffle capabilities.

Again, it is important to note that Intel elected to split AVX-512 into subsets of extensions. This allows software developers (and Intel processor designers!) to choose which aspects of AVX-512 to incorporate into their final products.

Which extensions ICX, SKX, and KNL have in common, and which ones they don't share
Extension	ICX	SKX	KNL	Functionality
AVX-512F	X	X	X	Foundation: expands upon AVX to support 512-bit registers; adds masked operations and other new features.
AVX-512CD	X	X	X	Conflict Detection: permits the vectorization of loops with certain kinds of write conflicts.
AVX-512BW	X	X		Byte and Word: adds support for vectors comprised of bytes, or of 8- or 16-bit integers; allows masked operations.
AVX-512DQ	X	X		Doubleword and Quadword: adds instructions for vectors of 32- or 64-bit integers; allows masked operations.
AVX-512VL	X	X		Vector Length: enables AVX-512 to work with up to 32 SSE or AVX registers; allows masked operations.¹
AVX-512PF			X	Prefetch: adds prefetch operations for the gather and scatter functionality introduced in AVX2 and AVX-512.
AVX-512ER			X	Exponential and Reciprocal: includes new operations for 2^x exponentials, reciprocals, and reciprocal square roots.
AVX-512VNNI	X			Vector Neural Network Instructions: accelerates convolutional neural network-based algorithms.
AVX-512...etc.	X			(various other ICX additions)

The message here is that while the AVX-512 extensions supported by ICX, SKX, and KNL overlap, they are not identical. As we will see, this has implications for how to compile codes that will run on Stampede2.

^{1. Reinders: "AVX-512 May Be a Hidden Gem" in Intel Xeon Scalable Processors,
HPCwire, June 29, 2017. ^}

Back