Conclusion
CUDA is a very powerful programming tool for GPGPU. It allows you to take full advantage of the highly parallel and multithreaded hardware design of a graphics processing unit. Here we summarize the major points from our survey of GPU performance considerations that you should be aware of when programming with CUDA:
- Avoid thread synchronization (though it may be unavoidable in places)
- Avoid thread divergence
- Coalesced memory access can greatly improve performance
- Global memory is slow but big; on-chip memory and registers are much faster but limited in size
- Array padding and tiling may help you in making the best use of the limited amount of on-chip shared memory
Our intention for this roadmap was to introduce basic CUDA programming techniques and concepts. There is a lot more to it. The NVIDIA CUDA C++ Programming Guide and other online materials, such as the Understanding GPU Architecture roadmap, provide additional resources for parallel computing and GPU programming.
            ©
            
            
             |  
            Cornell University
                           |  
                        Center for Advanced Computing
                           |  
                        Copyright Statement
                           |  
                        Access Statement
                        
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)
    CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)