CUDA is a parallel computing architecture and API created by NVIDIA that enables NVIDIA GPUs to serve as a platform for GPGPU, or general-purpose computing with GPUs. CUDA supports a number of high level programming languages such as C, C++, and Fortran; additionally, it features several large performance libraries to ease the task of GPU programming. There are wrappers for CUDA's API and libraries in many other languages as well.

CUDA provides a way to utilize a heterogeneous computing environment in which users offload their massively parallel tasks to NVIDIA GPUs, while performing their coarse-grain parallel (or even serial) computations with CPUs. These characteristics make CUDA a very suitable platform for HPC applications. Indeed, over 35% of the systems in the Top500 list of supercomputers from November 2024 are accelerated with NVIDIA GPUs.

This roadmap gives you an introduction to basic CUDA programming and performance optimization. The goal is just to expose you to CUDA; therefore, no in-depth parallel programming experience is required to benefit from the material. Most of what you need will be explained in context, with only a few references to external sources.

To get an in-depth look at GPU hardware, you may wish to precede or follow this roadmap with the Understanding GPU Architecture roadmap.

Objectives

After you complete this roadmap, you should be able to:

  • Use basic CUDA programming constructs
  • Compile a CUDA program
  • Write a CUDA program that uses numerous threads and does a simple parallel computation
  • Perform simple benchmarking
Prerequisites

This topic covers basic CUDA programming using the C programming language. A working knowledge of C and some understanding of GPU architecture and parallel computing are necessary for this topic. Thus, you may want to complete An Introduction to C Programming , Understanding GPU Architecture, and Parallel Programming Concepts and High-Performance Computing before beginning this topic. No prior experience with CUDA programming or GPUs is needed.

Should you need a reference, NVIDIA provides complete documentation for CUDA. Visit their website to see the latest versions of their NVIDIA CUDA Runtime API and CUDA C Programming Guide.

CUDA on Frontera or Vista at TACC

The Frontera User Guide and Vista User Guide have just a few short sections on GPUs with information on node types, job submission, and machine learning software. TACC recommends visiting NVIDIA's website to get the latest documentation on CUDA.

On Frontera or Vista, the CUDA Toolkit is located in $TACC_CUDA_DIR. Be sure to load the CUDA module so that $TACC_CUDA_DIR is defined and the tools are found in your $PATH. To load this module, issue the command to load CUDA 12.2 on Frontera or CUDA 12.5 on Vista.

Requirements
  • There are no specific requirements for this roadmap; however, access to Frontera or Vista may be helpful, or to any computer that hosts an NVIDIA GPU and has the CUDA Toolkit installed.
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)