Compiling for GPU Offload
To compile code with OpenMP offloading to NVIDIA GPUs, you need to specify the following compiler-specific flags, as recommended by TACC:
| Compiler | Commands | Offload flags |
|---|---|---|
| NVIDIA HPC Compilers | nvc, nvc++, nvfortran |
-mp=gpu |
| GCC | gcc, g++, gfortran |
-fopenmp -foffload=nvptx-none |
When compiling with nvc, it is recommended to specify the target architecture with -gpu=cc100 on Horizon or -gpu=cc90 on Vista. If this option is omitted and the target GPU is not present at compile time (e.g., on a login node), nvc compiles the code for a broad range of compute capabilities, producing a larger executable. Specifying the Blackwell architecture that is found on Horizon, or the Hopper architecture that is found on Vista, keeps the build smaller.
For GCC, the above flags only work if NVIDIA GPU support was enabled at the time that GCC was installed. You can check this with the command gcc -v 2>&1 | grep "offload". If NVIDIA support was enabled, you should see --enable-offload-targets=nvptx-none included in the output. At TACC, this is true for all GCC versions on Horizon and Vista, but not on older clusters.
OMP_TARGET_OFFLOAD
The OMP_TARGET_OFFLOAD environment variable controls what happens when a target region is encountered at runtime. It accepts three values: default, disabled, and mandatory. The default behavior of OpenMP is to offload to the device if one is available, and fall back to the host otherwise. This behavior can be explicitly defined with OMP_TARGET_OFFLOAD=default. If it is set to disabled, it runs the target region on the host, and if it is set to mandatory, it aborts the program if offloading is not possible.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)