A CUDA program comprises two interrelated types of code: host code and device code. As you might expect, the device code is executed by the GPU device, and the host code is executed by the CPU. For small programs, you can place both the host and device code in the same file, though you can also compile and link multiple separate files using NVIDIA's compilers.

Host and device codes both support C style syntax (if you decide to program in C/C++). CUDA further supports various programming language extensions, keywords, API functions, and GPU-accelerated libraries that allow users to implement their GPU applications in C/C++ as well as in other languages such as Fortran.

CUDA relies on certain language extensions called execution space specifiers to determine whether a given piece of code, or an area of memory, belongs to the host or the device. In C/C++, these specifiers are reserved identifiers that are recognized by NVIDIA's nvcc compiler.

Host Code

Host code is executed by the CPU. In a CUDA program, host code does not typically have an execution space specifier attached to its functions or variables, although these can be explicitly declared as host code with the __host__ specifier (two underscores on each side). The host code is able to call kernel functions. Calling a kernel function may be referred to as a kernel call, and the kernel function may be abbreviated as a kernel. Here is an example of host code:

Host code in a CUDA application can do the following:

  • Initialize a device
  • Allocate GPU memory
  • Transfer data to and from the device
  • Make a kernel call
  • Synchronize CPU and GPU execution
  • Deallocate GPU memory
  • Reset a device
Device Code

Device code is executed on the GPU device concurrently by threads. It must be explicitly declared as device code with the appropriate execution space specifiers. There are two types of device code: kernel functions and device functions.

Kernel Function

A kernel function is called from the host code to launch a grid. It must have the specifier __global__ and the return type void. Note that while the kernel function is intended to be only called from the host code, from Compute Capability 5.0 and onwards, it can also be called from other kernel functions or device functions. The following code is an example of a kernel function:

Device Function

A device function runs on the device and is called from kernel functions or device functions. Unlike kernel functions, it is not restricted to the void return type. It uses the specifier __device__. Here is an example:

CUDA supports additional specifiers and combinations of specifiers. The CUDA C++ Programming Guide lists specifiers, combinations, and their properties.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)