CUDA is a very powerful programming tool for GPGPU. It allows you to take full advantage of the highly parallel and multithreaded hardware design of a graphics processing unit. Here we summarize the major points from our survey of GPU performance considerations that you should be aware of when programming with CUDA:

  • Avoid thread synchronization (though it may be unavoidable in places)
  • Avoid thread divergence
  • Coalesced memory access can greatly improve performance
  • Global memory is slow but big; on-chip memory and registers are much faster but limited in size
  • Array padding and tiling may help you in making the best use of the limited amount of on-chip shared memory

Our intention for this roadmap was to introduce basic CUDA programming techniques and concepts. There is a lot more to it. The NVIDIA CUDA C++ Programming Guide and other online materials, such as the Understanding GPU Architecture roadmap, provide additional resources for parallel computing and GPU programming.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)