Instruction Sets
As already mentioned, vector operands and operations are specified in terms of vector instruction sets. The instruction set specifies the types (and lengths) of vectors that may be utilized in compiled code, as well as the kinds of operations that should be supported on the CPUs. In order to keep track of the vector capabilities in successive generations of Intel processors, vector instruction sets are given unique names: MMX, SSE, SSE4, AVX, and so on.
Over time, there has been growth in both vector register sizes and the number of available vector operations in x86 and x86_64 processors. The figure on the Registers page shows the historical progression of Intel's instruction sets and the associated vector register sizes.
Intel processors are generally backwards compatible, meaning that a new processor model will support preceding instruction sets. However, this was only partly true of the former Xeon Phi "KNL" processors, which didn't support the same subsets of the AVX-512 instruction set. (KNL did support instructions for 256- and 128-bit wide vector registers, though these instructions were only processed by 1 of the 2 VPUs.) And prior to KNL, the Xeon Phi coprocessors having the code name KNC were a total exception to the rule, as they had their own custom instruction set.
The AVX-512 vector instruction set on current Intel processors—which was foreshadowed by the KNC coprocessors—is
particularly rich. For example, it contains fused multiply-add (FMA) instructions that perform operations of the form
a + (b × c) → a in one cycle; plus it has scatter/gather, permute, and shuffle capabilities.
Anyway, one should be aware that Intel elected to split AVX-512 into subsets of extensions. This allows software developers to choose which aspects of AVX-512 to incorporate into their final products. As one example, the Ice Lake (ICX) and subsequent processorscan execute special neural network instructions that aren't available (or even valid!) on Skylake (SKX) nodes.
| Extension | ICX | SKX | KNL | Functionality |
|---|---|---|---|---|
| AVX-512F | X | X | X | Foundation: expands upon AVX to support 512-bit registers; adds masked operations and other new features. |
| AVX-512CD | X | X | X | Conflict Detection: permits the vectorization of loops with certain kinds of write conflicts. |
| AVX-512BW | X | X | Byte and Word: adds support for vectors comprised of bytes, or of 8- or 16-bit integers; allows masked operations. | |
| AVX-512DQ | X | X | Doubleword and Quadword: adds instructions for vectors of 32- or 64-bit integers; allows masked operations. | |
| AVX-512VL | X | X | Vector Length: enables AVX-512 to work with up to 32 SSE or AVX registers; allows masked operations.1 | |
| AVX-512PF | X | Prefetch: adds prefetch operations for the gather and scatter functionality introduced in AVX2 and AVX-512. | ||
| AVX-512ER | X | Exponential and Reciprocal: includes new operations for 2^x exponentials, reciprocals, and reciprocal square roots. | ||
| AVX-512VNNI | X | Vector Neural Network Instructions: accelerates convolutional neural network-based algorithms. | ||
| AVX-512...etc. | X | (various other ICX additions) |
The message here is that while the AVX-512 extensions supported by ICX, SKX, and KNL overlap, they are not identical. This can have implications for how to compile codes that will run on clusters composed of various processor types.
1. Reinders: "AVX-512 May Be a Hidden Gem" in Intel Xeon Scalable Processors, HPCwire, June 29, 2017. ^
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)