Standard Compiler Directives
High-level programming models such as OpenMP and OpenACC use directives—special types of comments that a compiler can recognize—to provide the compiler with guidance as to how the application is to be built. Compiler directives typically identify the available parallelism and spell out the rules for data sharing and mapping. The compiler then uses this information in constructing the executable. This approach allows the programmer to avoid much of the low-level programming and optimization burden that would otherwise be required. Instead, these chores are automatically handled by the compiler.
The biggest advantage of the directive programming model is that it allows quicker migration of existing, sequential CPU applications written in standard programming languages such as C, C++, and Fortran. With the addition of directives, codes are able to evolve into parallel versions that can offload work to heterogeneous accelerators, without requiring major changes to the existing code structures. The conversion process may very well yield a parallel version for multi-core CPUs along the way.
OpenMP
In a typical scenario, a reference CPU implementation will be converted into an initial OpenMP version to enable parallelized, multi-core processing on CPUs, as described in the OpenMP roadmap. The original code may even have been previously parallelized with MPI; in that case, an MPI "outer layer" can be retained to enable parallel execution on a CPU cluster, as described in the Hybrid Programming with OpenMP and MPI roadmap. In either situation, the parallel version for CPUs can be further extended with OpenMP target-offloading directives and data-mapping directives to create a GPU offloading version. The resulting code should be portable across a variety of CPU/GPU platforms, including heterogeneous CPU/GPU clusters (if MPI is in the mix).
There can be drawbacks to creating a single, unified OpenMP source code, however, because the optimal strategies for parallelism and data mapping may differ between CPU and GPU versions. On GPUs, the OpenMP target and data mapping directives are essential; but on CPUs, they are unnecessary, because the original host data and the corresponding data on the "device" will share storage. If there is just a single source code, the compiler for a CPU must choose how to implement the OpenMP directives that are more intended for GPUs, yet its choices may not be ideal for CPUs. For instance, some compilers might ignore OpenMP team-level parallelism on CPUs, with the possible result of parallel inefficiency due to loop serialization.
OpenACC
It is straightforward to convert an OpenMP implementation into an OpenACC implementation or vice versa. Both are directive-based programming models for accelerators, and both provide very similar execution and memory models. However, different OpenMP/OpenACC compilers may choose different parallelism mapping strategies, which can lead to differences in parallel efficiency. Furthermore, the available OpenACC compilers vary in terms of their maturity and supported features.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)