CUDA Concepts
Zilu Wang and Steve Lantz
Cornell Center for Advanced Computing
8/2025 (original)
The hardware design for GPUs is optimized for highly parallel applications, so the programming model is very different from the traditional serial programming model using CPUs. If you don't have any prior parallel programming experience, some CUDA concepts and techniques may seem hard to understand. This topic aims to clarify important CUDA terminology and how it relates to GPU hardware.
Reference Bibliography:
Hwu, Wen-mei W., and Kirk, David B., "Programming Massively Parallel Processors: A Hands-on Approach", 3rd edition (Morgan Kaufmann, December 21, 2016).
Hwu, Wen-mei W., Kirk, David B., and El Hajj, Izzat, "Programming Massively Parallel Processors: A Hands-on Approach", 4th edition (Morgan Kaufmann, August 18, 2022).
NVIDIA Corporation, "CUDA C++ Programming Guide", v12.6.
NVIDIA Corporation, "CUDA C++ Best Practices Guide", v12.6.
Objectives
After you complete this topic, you should be able to:
- Outline the structure of a CUDA program
- List and understand the main components of CUDA programming: threads, thread blocks, grids
- Write a program that includes a kernel function to launch a grid on GPU devices
Prerequisites
This topic covers basic CUDA programming and its connection to GPU architecture using the C programming language. A working knowledge of C/C++ and some understanding of parallel computing are necessary for this topic. Thus, you may want to complete An Introduction to C Programming and Parallel Programming Concepts and High-Performance Computing before beginning this topic. While GPU terms are explained in the context of CUDA programming, this topic does not cover the specifics of GPU architecture; you may want to complete Understanding GPU Architecture to learn more about that. No prior experience with CUDA programming or GPUs is assumed.
Should you need an in-depth reference, NVIDIA provides complete documentation for CUDA. Visit their website to see the latest versions of their NVIDIA CUDA Runtime API and CUDA C Programming Guide.
The Frontera User Guide and Vista User Guide have just a few short sections on GPUs with information on node types, job submission, and machine learning software. If you're on Frontera or Vista, be sure to load the CUDA module before compiling any programs. To load this module, issue the command to load CUDA 12.2 on Frontera or CUDA 12.5 on Vista.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)