Cornell Virtual Workshop > Introduction to CUDA > GPU Performance Topics

Conclusion

CUDA is a very powerful programming tool for GPGPU. It allows you to take full advantage of the highly parallel and multithreaded hardware design of a graphics processing unit. Here we summarize the major points from our survey of GPU performance considerations that you should be aware of when programming with CUDA:

Avoid thread synchronization (though it may be unavoidable in places)
Avoid thread divergence
Coalesced memory access can greatly improve performance
Global memory is slow but big; on-chip memory and registers are much faster but limited in size
Array padding and tiling may help you in making the best use of the limited amount of on-chip shared memory

Our intention for this roadmap was to introduce basic CUDA programming techniques and concepts. There is a lot more to it. The NVIDIA CUDA C++ Programming Guide and other online materials, such as the Understanding GPU Architecture roadmap, provide additional resources for parallel computing and GPU programming.

Back

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)