In this exercise, the goal is to compile and run the CUDA and OpenMP codes presented earlier, in which the loop that contains most of the work is offloaded to the GPU. The instructions below should be valid for any system that is equipped with at least one NVIDIA GPU, plus appropriate installations of the NVIDIA HPC SDK and GCC. The steps have been verified to work on TACC's Vista. On a system like TACC's Frontera—which has only the NVIDIA CUDA Toolkit and lacks a special build of GCC—only Part I will work.

Part I: Compile and Run the CUDA Code
  1. Download the C++ code (or copy the program text from the linked page, or from the previous page):
  2. Compile the CUDA code with nvcc. Using this compiler requires the presence of g++ (or another compatible compiler) on the host in order to compile the host-side portion of the code. On Vista, both of these compilers are available in the default environment. On Frontera, it is necessary to load a module to make nvcc and its associated libraries available:
    Invoke the compiler to generate the executable:
  3. Start an interactive session in a GPU-equipped queue and run the executable:
  4. This interactive session can be retained for Part II (Vista only).
Part II: Compile and Run the OpenMP Offloading Code
  1. Download the C++ code (or copy the program text from the linked page, or from the previous page):
  2. Compile the OpenMP code with nvc++. This distinct NVIDIA compiler is also present in the default environment on Vista. Then, compile the code a second time, omitting the -fopenmp flag so that the OpenMP directives are ignored. This produces a single-threaded code for the CPU:
  3. Start an interactive session in a GPU-equipped queue and run both of the executables. Confirm that they produce identical results:
  4. The OpenMP offloading code can also be compiled with g++. In this case, the default compiler on Vista will be inadequate, as it is not enabled to generate codes for NVIDIA GPUs. Thus, an error will occur unless a module is loaded prior to compiling:
    Note that an extra option is required to specify that the offload target is an NVIDIA GPU. If the -foffload flag is missing, then by default, the target is the CPU. And if both -foffload and -fopenmp are absent, then the result is a single-threaded code for the CPU.
  5. Run the executable in your interactive session:
    Extra credit: recompile the OpenMP code without the -foffload option and confirm that it works on a machine that has no GPUs at all, e.g., on one of Vista's "gg" nodes.
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)