We explore some basic ways of moving computational work and the associated data from CPUs to GPUs for codes that are well suited to computing on GPU architectures. After describing the host-device execution model, we compare a few of the capabilities of the CUDA and OpenMP programming models by looking at simple C++ code examples. Then, we survey additional software tools or systems that enable portability across different types of GPUs in various languages, to see what is available for writing code that will run on heterogeneous platforms.

Objectives

After you complete this roadmap, you should be able to:

  • Describe the kinds of computing tasks that ought to work well on GPUs vs. CPUs
  • Identify the roles of the CPU and GPU in kernel execution, memory allocation, and data transfers
  • Write and execute simple C++ programs that offload computations to the GPU using CUDA and OpenMP
  • Compare different code portability solutions (e.g., CUDA, OpenMP, HIP, SYCL, Kokkos, Alpaka) in terms of performance and ease of implementation
  • Name common GPU strategies for Python developers
  • Evaluate the trade-offs between performance portability and development complexity in research applications
Prerequisites
Requirements
  • There are no specific requirements for this roadmap. However, in order to run the exercise, access to Frontera or Vista may be helpful, or to any computer that hosts an NVIDIA GPU and has either the NVIDIA HPC SDK or the NVIDIA CUDA Tookit installed, along with a GCC build enabled for NVIDIA GPUs.
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)