Compiling for GPUs
Compiling for GPUs on Vista
Vista is a GPU-focused cluster. To use the GPUs, your application code must include an interface that supports NVIDIA GPUs, such as CUDA, and it must be built so that the GPU support is enabled.
-
For TACC-supported software:
- If NVIDIA GPU integration is possible, then a prebuilt application and/or libraries are usually available through a module.
-
In cases where the application must be compiled from source code:
- If the code already incorporates CUDA or another compatible GPU interface, then GPU support can be enabled with additional command-line flags or configuration options when you build the application.
- If the source code does not include a compatible GPU interface, then the source code must be modified with CUDA, OpenMP, or some other suitable instructions prior to compiling. Your options are covered in the roadmap on GPU Migration and Portability.
Assuming your source code does support NVIDIA GPUs, GPU integration must still be enabled at compile time, or a GPU-enabled library must be made available to the application. This page describes different types of GPU integration for Fortran, C/C++, and Python programs, and it indicates how to use the tools available through the NVIDIA HPC Software Development Kit (SDK) in each case. On Vista, you get access to the full NVIDIA HPC SDK through any one of the nvidia modules.
CUDA Programming Model for Fortran, C++, and C
The CUDA programming model is developed by NVIDIA as a parallel computing platform and an application programming interface that can be used to explicitly control and program NVIDIA GPUs. (CUDA was originally an acronym for Compute Unified Device Architecture.) The CUDA C++ and CUDA Fortran language extensions provide the necessary interfaces to write CUDA programs which include kernels that execute on NVIDIA GPUs.
On Vista, the nvidia module provides nvfortran, the NVIDIA HPC Fortran compiler, which can compile both standard Fortran code and CUDA Fortran programs. To enable CUDA Fortran during compilation, either the source file can be named with the .cuf or .CUF, or the -cuda flag can be specified (or both). Additionally, if a file is compiled with -cuda, it must also be linked with -cuda. For more details on CUDA Fortran, see the NVIDIA CUDA Fortran Programming Guide.
The nvidia module also provides nvcc, NVIDIA's primary compiler for CUDA C++ programs (including C-style C++ code). To enable CUDA compilation with nvcc, the source file must be named with the .cu or .CU extension. For more details on CUDA C++, see the CUDA Programming Guide or the Introduction to CUDA roadmap (for a C-style introduction to CUDA).
Note that nvcc by itself only compiles the CUDA parts of the C/C++ code. The parts of the .cu file that run on the host, or any host-only source files (like hello.c above), are passed to a host compiler. Running nvcc -v hello.c on Vista reveals that the underlying host compilers are actually gcc and g++. It turns out that it is possible to pass options to the underlying host compiler as well as nvcc. However, each option to the host compiler must be prefaced with -Xcompiler, like this:
The above architecture options are appropriate for the Grace Hopper Superchips on Vista. Adding the option -Xcompiler -v to the above command line further reveals that the gcc version is 11.4.1, which matches the result from /usr/bin/gcc -v, i.e., the compiler that accompanies the operating system.
To avoid relying on GCC 11.4.1 for compiling host code (along with the complications that may arise from that), you can try another compiler on CUDA C/C++ codes: the NVIDIA HPC C++ compiler, nvc++. It too is provided by the the nvidia module on Vista. (Note, the nvc compiler does not seem to work for CUDA C programs on Vista; use nvc++ instead.) But be aware that the nvc++ compiler is targeted more toward different GPU interfaces (see below) and CPUs, so it may not fully support all the features of CUDA. To enable CUDA during compilation, either the source file can be named with the .cu or .CU extension, or the -cuda flag can be specified (or both). Additionally, if a file is compiled with -cuda, it must also be linked with -cuda.
OpenACC Application Program Interface for Fortran, C++, and C
Directive-based programming using OpenACC is also supported on Vista through the NVIDIA HPC compiler suite. The nvfortran, nvc++, and nvc compilers all accept the -acc option to enable OpenACC directives, and by default, this flag offloads OpenACC regions to the NVIDIA GPU. For details, see the NVIDIA HPC Compilers User's Guide: Using OpenACC.
OpenMP Application Program Interface for Fortran, C++, and C
The nvfortran, nvc++, and nvc compilers support a subset of the OpenMP standard. The -mp=gpu flag enables OpenMP offloading to NVIDIA GPUs, while -mp alone would parallelize OpenMP regions across all CPU cores on the node, hence, the =gpu part is necessary to target and compile code for GPU offoading. For details, see the NVIDIA HPC Compilers User’s Guide: Using OpenMP.
Stdpar for Fortran and C++
The NVIDIA HPC Compilers nvfortran and nvc++ also support the Stdpar parallel programming model, specifically the C++ standard library parallel algorithms and Fortran do concurrent loop construct. The -stdpar=gpu flag enables the program to use the GPU through these built-in language features. For details, see the NVIDIA HPC Compilers User’s Guide: Using Stdpar.
The -gpu and -cudalib Flags for Fortran, C++, and C
The NVIDIA HPC compilers provide the -gpu and -cudalib flags to target specific compute capabilities and CUDA toolkit versions, manage the memory model of the program, or add and link CUDA-optimized libraries. For details, enter man nvfortran, man nvc++, or man nvc at the command line to see all the ways that these options can be used.
What about Python?
Python programs can use GPU-acclerated libraries such as PyTorch (for ML/AI), CuPy, Numba-CUDA, and many other packages to make use of the GPUs on Vista. Each package makes use of GPUs in different ways. For example, CuPy implements a subset of NumPy interfaces with a GPU array backend, which means all arrays instantiated with CuPy are allocated on the GPU, and NumPy operations on them are executed using precompiled CUDA code. And Numba-CUDA supports just-in-time compiling (jit-compiling) of Python code into CUDA kernels and functions that run on the GPU.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)