Teams and Parallel
An OpenMP offload created with just the target directive executes the entire region on one single thread. It does not offer any parallelism other than what the device would provide without OpenMP, and may even be slower than host execution due to the overhead of data transfer. To actually parallelize the computation on the device, we need the teams and parallel directives.
The teams directive creates a league of teams for execution on GPU SMs: these correspond to CUDA blocks. As we will see, the parallel directive then creates threads within each team: these correspond to CUDA threads. With these directives, all threads will execute the same code region, similar to the way CUDA works.
How can different work be assigned to each thread? CUDA threads would make use of their block ID and thread ID. It is possible to do this in OpenMP as well, by using API calls to omp_get_team_num() and omp_get_thread_num(). However, this is not the usual practice in OpenMP, because unlike CUDA, OpenMP also offers worksharing directives that apply to loops. As we will see, this causes the loop body to execute as intended on CPUs as well as GPUs.
The number of OpenMP teams can be set with the num_teams() clause on the teams directive, and the number of threads per team can be set with num_threads() on the parallel directive. If these clauses are not specified, the values are implementation defined.
target, teams and parallel directives
Again, the teams and parallel directives create the teams and threads needed for parallel execution on the device, but they do not divide the work, even if the code that follows these directives is a loop. Without additional directives to distribute the different loop iterations, every team and every thread just executes the same code. We will cover how to distribute and parallelize loops in the next page.
Related Environment Variables
The number of teams and threads can also be controlled through environment variables. If the number is set by the clause, it overrides the values set by environment variables.
| Environment Variable | Description |
|---|---|
OMP_NUM_TEAMS |
Sets the maximum number of teams created by a teams construct |
OMP_TEAMS_THREAD_LIMIT |
Sets the maximum number of threads per team |
OMP_NUM_THREADS |
Sets the number of threads |
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)