Code that is intended to execute on a GPU must rely on a software stack extending all the way from low-level hardware drivers to a high-level software platform that provides APIs accessible to developers. But high-level software platforms are most often tailored to the GPU devices made by a particular manufacturer. Source code that is written to run on one GPU platform will generally not run on a different one, unless significant changes are made to the code. Given today's broad array of manufacturers of GPUs and accelerators, it seems desirable to avoid "vendor lock-in" of your source code.

Portability refers to the extent to which an application code will run on heterogeneous platforms with no changes to the source. In this topic, we will examine software systems or standards that are designed to facilitate portability by treating different platforms as "backends" to a single, unified programming model. The more backends that are supported, the more GPUs (and even CPUs) become accessible to your application.

We begin our discussion by looking at a couple of vendor-supported programming models. Though they represent the least portable option, they can have advantages, too. CUDA, especially, may be considered the progenitor of many of the more general interfaces that came later.

CUDA

CUDA is NVIDIA's multithreaded SIMD model for general-purpose GPU computing. CUDA refers to the underlying software platform as well. You can learn more about it in the Introduction to CUDA roadmap. As a proprietary GPU programming model, CUDA gives you access to the full range of NVIDIA-specific features—for example, tensor cores. But due to these architecture-specific capabilities, CUDA code is not portable to accelerators that are made by other manufacturers. Nevertheless, the abstractions in its execution model and memory model resemble more portable technologies like OpenCL and SYCL. As a result, even though CUDA code is not directly portable, it can readily be migrated to other types of interfaces. This usually comes at some cost to performance on NVIDIA devices, however.

Because CUDA partly relies on extensions to standard computer languages, your source code must be compiled with NVIDIA's proprietary compilers, namely nvcc for C/C++, and nvfortran for Fortran. The current versions of these compilers originated with The Portland Group (PGI), a longtime independent vendor of compilers for HPC prior to their acquisition by NVIDIA. The NVIDIA compilers produce executables that are very well optimized for their hardware. But the dependency of CUDA code on these special compilers represents one more way in which CUDA is not the most portable option.

HIP

HIP is AMD's native programming model for their GPUs, but at the same time it is designed to allow portability across both NVIDIA and AMD GPUs. The syntax is accordingly designed to be very similar to CUDA, so that most API calls amount to simple translations of names. For GPU codes that depend only on the core functionalities of CUDA, such as memory allocation and kernel launches, this makes it is possible to do a relatively straightforward port from a CUDA version to a HIP version, which enables the code to run on AMD hardware as well as NVIDIA. This could represent a step up in portability for some codes.

It should be noted that AMD’s true analog to CUDA is called ROCm. HIP is a just a thin-layer API that has little or no performance impact over coding directly in either or NVIDIA CUDA or AMD ROCm. The full ROCm software stack and associated tools, which are built primarily on open-source software, provide the real support for programming AMD GPUs.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)