MKL and FFTW
What is MKL? The Intel oneAPI Math Kernel Library supports both Fortran and C interfaces. It includes functions for
BLAS levels 1-3, LAPACK,
FFT,
the Vector Math Library (VML), and others. MKL is optimized by Intel for all current Intel architectures.
As we'll see in the exercise, it incorporates
shared-memory
parallelism via
OpenMP
if desired; just set OMP_NUM_THREADS
or MKL_NUM_THREADS
> 1.
This library comes bundled with the Intel compilers. If you switch to a non-Intel compiler, you must re-load MKL explicitly:
$ module swap intel gcc
$ module load mkl
$ module help mkl
Assuming the correct environment module is loaded, you proceed to compile and link with just an extra flag or two.
The link line requires special attention when using the GNU compilers with MKL; see the
Frontera User Guide or the
Intel
oneAPI Math Kernel Library Line Advisor
website. But if you use the Intel compilers, they already "know" where the MKL libraries and headers are located, and
it is generally pretty easy to get MKL to compile correctly, as shown in the examples below. (For final linking, the
flags -lpthread -lm
may also be needed.)
$ icc myprog.c -mkl # this means the same as -mkl=parallel
$ ifort myprog.f90 -mkl
Use -mkl-sequential
if you do not wish to load the OpenMP-multithreaded version of MKL. More detailed
compiler and linker flags may also be required for static linking or other non-typical MKL specifications.
If the exact locations of any MKL files are needed, they will be found under $MKLROOT
. TACC also provides
the special environment variables
$TACC_MKL_INC
and $TACC_MKL_LIB
to point to MKL's header and library files.
For MPI codes, you are likely to want
to access the Intel compilers through mpicc or mpif90. The MKL flags are mostly unchanged—but have a look at the
Frontera User Guide for the special case of
ScaLAPACK.
Note, -mkl-sequential
may be the right choice for a code that relies on MPI.
Another significant math library is the Fastest Fourier Transform in the West (FFTW), a comprehensive set of C routines for computing discrete Fourier transforms. It uses a so-called cache-oblivious algorithm to obtain excellent performance regardless of the platform on which it runs. The routines can transform single- and multi-dimensional real and complex data of arbitrary input size. All the right stuff is built in for you: the Cooley-Tukey algorithm, the Prime Factor algorithm, Rader's algorithm for prime sizes, the split-radix algorithm, and so on. On Frontera, one can either make use of pre-built FFTW 2 and 3 libraries from their own modules, or use MKL, which implements FFTW-compatible interfaces.