In this exercise, we will test a simple CUDA program that queries the attached CUDA devices and gathers information about them using the CUDA Runtime API. No CUDA programming is involved; rather, the goals of this exercise are simply to demonstrate how to prepare and submit a GPU job, and to see how the Runtime API can be used to discover hardware properties.

Here is how to compile the source code, prepare the batch file, and submit the job. The instructions are specific to Frontera, but they can easily be modified for other systems.

1. Copy and paste (or download) the following code into a new file named devicequery.cu.

2. Load the CUDA software using the module utility.

3. Compile the code using the nvcc compiler, adding flags to ensure that the device has a compute capability that is known to be acceptable for running the code.

4. Prepare (or download) the batch file and save it as batch.sh (or you can pick any filename). Remember to specify one of the GPU queues, such as Frontera's rtx-dev queue.

5. Submit your job using the sbatch command.

6. Retrieve the results. If your job ran successfully, your results should be stored in the file gpu_query.o[job ID]. Assuming you specified Frontera's rtx-dev queue, your output should look like the following:

As you see, the program acquires device information via the CUDA Runtime API and outputs everything to STDOUT. The various device properties reported by the program have been discussed elsewhere in the Understanding GPU Architecture roadmap (except for the dimensions of blocks and grids).

Many more functions are provided in the CUDA Runtime API. Detailed documentation is available from NVIDIA.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement