Cornell Virtual Workshop > Introduction to Advanced Cluster Architectures > Compiling for Xeon SP

Compiler Options for Xeon SP

The Intel compilers are recommended for codes that will run on Intel Xeon SP processors, since they will be able to make maximal use of those computing resources. On Stampede3 and Frontera, you should find that the Intel environment module is loaded by default. This means that the paths and environment variables for these compilers are already set up for you in your shell. On Frontera, this lets you access the compilers very easily from the command line as icc and ifort for C/C++ and Fortran, respectively; on Stampede3, icx is the Intel compiler for C/C++. You might also be interested in our more general discussion about Code Optimization via Compilers.

An important compiler option is one which specifies the target architecture for compilation, by indicating which instruction set(s) to use. For the Intel compilers, this information is provided via the -x option. The choice you make might depend on where you want to be able to run your code (without needing to recompile for a different architecture), and on what types of computations you are carrying out. Perhaps you want to run on the Skylake (SKX) nodes on Stampede3 and the Cascade Lake (CLX) nodes on Frontera, in which case you can compile a single binary that will run on both. Alternatively, you might prefer to build a specialized binary that is better suited for a single node type. The following list describes some possible options for you to run your code on either the SKX or the CLX nodes on the TACC systems:

-xCORE-AVX512 to run on both SKX and CLX
-xCORE-AVX2 to run with older AVX2 instruction set (as it might run at higher clock speed as a result, as described in more detail on the next page)
-xCASCADELAKE to include additional instructions available on CLX nodes, such as the AVX VNNI instructions supporting Neural Network computations (introduced in Intel compilers v. 19)

You can of course use one of the compiler options above in conjunction with other options, e.g.,

$ icc -xCORE-AVX512 -O3 -qopenmp -qopt-zmm-usage=high omp_hello.c -o omp_hello
$ ### OR ###
$ ifort -xCORE-AVX512 -O3 -qopenmp -qopt-zmm-usage=high omp_hello.f90 -o omp_hello

The -O3 option tells the compiler to try as hard as possible to find loop transformations and other types of optimizations that will help in vectorizing the code and otherwise make it run fast. The default optimization level for Intel compilers is -O2. You will notice that the -qopenmp option has also been selected. This is because any code that runs well on SKX or CLX must be parallelized in some way. Multithreading with OpenMP is one common technique; MPI is another possibility, and an MPI/OpenMP hybrid may work best of all. The -qopt-zmm-usage=high impacts some details of how code vectorization is carried out, as described on the next page in more detail (under "Vector Optimization and 512-Bit ZMM Registers").

The GNU compiler collection is also able to produce executables for Intel Xeon Scalable Processors. (On machines using a module system to customize user environments, such as Stampede3 and Frontera, one can swap out the Intel compilers and swap in the GNU ones using the shell command module swap intel gcc.) In many cases, there are analogous GNU options corresponding to the Intel options discussed above, with overall target architecture specified via the -march option. These GNU options also work with the Intel compilers. The analog of the call to icc above is:

$ gcc -march=skylake-avx512 -O3 -fopenmp -mprefer-vector-width=512 omp_hello.c -o omp_hello

The last option -mprefer-vector-width=512 is analogous to the -qopt-zmm-usage=high provided by the Intel compilers, since by default, a 256-bit vector width is used. It should be noted, however, that this option is only available beginning in GNU Compilers version 9. The -march=skylake-avx512 option will generate code for both Skylake and Cascade Lake, similar to the Intel -xCORE-AVX512. There is also a separate architecture flag, -march=cascadelake, that can be specified for CLX processors, which includes the VNNI instructions. As with the Intel compilers, support in the GNU suite for more recent Intel processors generally requires more recent versions of GCC, which might require inspection of the compiler documentation to determine if a specific -march option is available for your currently installed version of GCC.

If the machine you are compiling on has exactly the same architecture as the node(s) your code will be running on, then the best architecture flag to use in general is -xHost (for Intel) or -march=native (for GCC).

The Intel and GNU compilers are available on the Stampede3 and Frontera compute nodes, as well as the login nodes. If your compilation is extra-large—say, if it involves numerous source files—it may be preferable to do it on a compute node. That way a login node does not get too bogged down for other users. The command to use is exactly like the ones shown above.

Remember too that the make utility has the capability of running multiple recipes simultaneously as it builds the various targets in your makefile(s). So if your code is divided into many smaller source files, and if your Makefiles do not contain too many internal dependencies, then make -j48 (say, for a 48-core SKX node) should accelerate the building process significantly for you on SKX/CLX.

TACC provides some additional information on compiling software on both Stampede3 and Frontera that you might find useful:

Back