Cornell Virtual Workshop > Introduction to CUDA > CUDA Higher-Level Constructs

Exercise: Streams

This short exercise demonstrates the usages of streams and asynchronous memory transfer. You can compile and run your code at any point during its development to check your progress. If you are using TACC resources, please refer to the earlier page with instructions for Slurm.

To begin, we will adapt the code from Exercise: Thread Mapping. Here is the original code:

Showcasing streams will require more than one kernel invocation. Here, we keep the one kernel we defined previously, but we will invoke it on two separate matrix variables. After this modification to main(), there are two kernel invocations that call the same kernel function, but with two different inputs.

To create a stream, we must first declare a cudaStream_t variable and then initialize a stream using cudaStreamCreate. Then, during kernel invocation, the stream variable is passed as the fourth parameter, specifying the stream on which the kernel is enqueued. At the end of the program, the streams are destroyed with cudaStreamDestroy.

Asynchronous memory transfers are slightly more complex. First, since asynchronous memory transfer requires both pinned memory and streams, we will implement pinned memory for the host matrices. We replace cudaMemcpy with cudaMemcpyAsync and specify a streams to transfer each matrix asynchronously. We also need to include cudaDeviceSynchronize(), because without it, the host will attempt to validate the results before transferring the data back. Lastly, we free the pinned memory with cudaFree instead of free.

Interestingly, cudaMemcpyAsync will not cause an error if memory is not pinned; instead, CUDA will transfer memory as if cudaMemcpy was called. You can observe this behavior if you revert to using malloc and remove cudaDeviceSynchronize. There will not be an assertion error, as the host becomes synchronized with the device through the calls to cudaMemcpyAsync.

Back

© | Cornell University | Center for Advanced Computing | Copyright Statement | Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)