Cornell Virtual Workshop > Introduction to CUDA

CUDA Memory Model

Zilu Wang and Steve Lantz
Cornell Center for Advanced Computing

8/2025 (original)

A CUDA device has many different memory components, each with a different size, bandwidth, and scope. Since CUDA programs can read and write to specific memory components, we need to know how GPU memory is organized on these components. Ideally, a program will be structured so that its threads most often find the data they need in fast cache memory and registers, and they don't need to go to global memory to retrieve data, which would be ineffective and slow. On the other hand, the global memory is significantly larger than caches or registers. To realize the full potential of the device, we need to understand how to properly utilize the different levels in the memory hierarchy.

Objectives

After you complete this topic, you should be able to:

Outline CUDA's memory hierarchy and variable locations
Know different methods of memory allocation
Write a program that utilizes different memory allocation methods

Prerequisites

This topic covers basic CUDA programming and its connection to GPU architecture using the C programming language. A working knowledge of C/C++ and some understanding of parallel computing are necessary for this topic. Thus, you may want to complete An Introduction to C Programming and Parallel Programming Concepts and High-Performance Computing before beginning this topic. While GPU terms are explained in the context of CUDA programming, this topic does not cover the specifics of GPU architecture; you may want to complete Understanding GPU Architecture to learn more about that. No prior experience with CUDA programming or GPUs is assumed.

Should you need an in-depth reference, NVIDIA provides complete documentation for CUDA. Visit their website to see the latest versions of their NVIDIA CUDA Runtime API and CUDA C Programming Guide.

The Frontera User Guide and Vista User Guide have just a few short sections on GPUs with information on node types, job submission, and machine learning software. If you're on Frontera or Vista, be sure to load the CUDA module before compiling any programs. To load this module, issue the command to load CUDA 12.2 on Frontera or CUDA 12.5 on Vista.

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)