Distribute and For/Do
As the teams and parallel directives create the necessary teams and threads for parallel execution, the distribute directive splits iterations of an associated loop across teams, and the for/do directive (in C/Fortran) splits iterations among the threads within each team (loop worksharing). All the directives, together, form the full combined directive to offload the work to the device with true parallelism: target teams distribute parallel for.
The compiler's implementation of OpenMP should do a reasonable job of dividing up the workload of the loop, if it is given nothing more than the combined directive shown above. In case it does not, or in case certain variables need special treatment, the distribute and for/do directives may be modified with clauses that adjust their default behavior. Here is a sampling of the clauses that are available.
| Clause | distribute |
for/do |
Effect |
|---|---|---|---|
private(list) |
x | x | Declares the listed variables and arrays to be private to each team or thread. |
firstprivate(list) |
x | x | Declares the listed variables and arrays to be private to each team or thread, and initializes each one to the value(s) of the corresponding originals. |
collapse(n) |
x | x | Merges n nested loops into a single, combined loop and divides the resulting iterations among teams or threads. |
dist_schedule(static,n) |
x | Assigns chunks of loop iterations of static size n to each team, in round-robin fashion. | |
schedule(kind,n) |
x | Assigns chunks of loop iterations to threads, based on the value of kind—which may be static, dynamic, or guided—and chunk size n. |
|
reduction(op:out_var) |
x | Denotes that an operator op (such as +, *, min, max), applied cumulatively to items inside the parallel loop, will give a cumulative result across all threads (and teams, when in a combined construct), provided the result is assigned to the shared variable out_var. |
Further guidance on the above, as well as various additional clauses that pertain to all the directives we have discussed, can be found in the OpenMP 5.1 documentation. Not all features may be supported by your particular compiler, so it is best to consult the documentation for your compiler suite, such as the NVIDIA compilers or the GNU compilers.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)