With OpenMP, it is relatively easy to make the thread affinity follow some overall pattern. You do this by setting the values of certain environment variables. The following variables have an effect on any OpenMP code:

  • OMP_PROC_BIND specifies how OpenMP threads are distributed across some set of "places"
  • OMP_PLACES says how those places relate to the available hardware

A similar environment variable acts only on OpenMP codes that are compiled with Intel compilers:

  • KMP_AFFINITY controls both the distribution and granularity of thread placement
General OpenMP Environment Variables

Considering first the more general OpenMP variables, let's say we want to bind threads to cores, and we want to do it in a way that spreads out the threads evenly among the available cores. If we set OMP_PROC_BIND=spread and OMP_PLACES=cores, as shown below, then we achieve the desired binding.


export OMP_PROC_BIND=spread
export OMP_PLACES=cores
export OMP_NUM_THREADS=48
./my_openmp_code

Acceptable values that may be assigned to OMP_PLACES include the abstract names sockets, cores, and threads, where "threads" in this context means "hardware threads", i.e., execution slots that are enabled through hyperthreading. It is also allowable to assign a set of numbers to OMP_PLACES, instead of an abstract name, where the numbers correspond to individual CPUs (threads or cores), as described previously in the page on using numactl.

The following table summarizes the options that are available through the use of the OMP_PROC_BIND variable, assuming OMP_PLACES is set equal to cores.

Values of the OMP_PROC_BIND variable and the corresponding thread affinity patterns
OMP_PROC_BIND Pattern for OMP_PLACES=cores
spread Binds threads to cores as sparsely as possible, according to core number. If threads exceed cores, then consecutive thread numbers are grouped together.
close Packs consecutive threads as close as possible to each other. Multiple consecutive threads are packed on each core if necessary.
master Binds threads to the same place as the master thread (probably this is most suitable when the places are sockets).
false Completely disables any affinity settings.

If an OpenMP parallel section in the code contains a proc_bind clause, then the affinity specified by that clause takes precedence over the above settings. Additional complexities may come into play for nested parallel sections; see the OpenMP documentation for details. Additional discussion of how to configure affinities in OpenMP can be found in the relevant section of Victor Eijkhout's Parallel Programming for Science and Engineering.

Intel Compiler Environment Variables

While the environment variables described above are effective on any OpenMP code, the Intel compilers provide a very similar mechanism for controlling thread affinity through the KMP_AFFINITY environment variable. The possible type values of KMP_AFFINITY and their effects are presented in the next table; it is assumed that granularity=core is also specified.

Values of the KMP_AFFINITY variable and the corresponding thread affinity patterns
KMP_AFFINITY Pattern for KMP_AFFINITY=granularity=core,...
scatter Does a round-robin assignment of threads to cores.
balanced Acts like scatter, but keeps consecutive thread numbers together, if threads exceed cores. (Only supported in single-socket configurations.)
compact Packs consecutive threads as close as possible to each other.
explicit Binds threads to the core numbers listed in the proclist= modifier.
none No thread affinity is set. OpenMP variables may still determine affinity.
disabled Completely disables any affinity settings.

The code block below displays commented-out pairs of OpenMP variables, followed by the equivalent setting of KMP_AFFINITY.


# export OMP_PLACES=cores
# export OMP_PROC_BIND=close
export KMP_AFFINITY=granularity=core,compact

# export OMP_PLACES=cores
# export OMP_PROC_BIND=spread
export KMP_AFFINITY=granularity=core,balanced

The KMP_AFFINITY=balanced type seems to work as expected on TACC's dual-socket systems, even though according to Intel documentation, "this affinity type is supported on the CPU only for single socket systems". In any event, KMP_AFFINITY=scatter would have nearly the same effect and is likely to be sufficient.

To give another example:

  • Imagine a system with 4 cores and 4 hardware threads/core
  • See below for the placement of 8 threads when KMP_AFFINITY=granularity=fine,<type>
  • As shown, the scatter type will fully utilize all cores, but the compact type may not
  • Recommendation: use compact only in special cases, or perhaps with granularity=core
Thread placements that result when KMP_AFFINITY is set to compact vs. scatter
Thread placements that result when KMP_AFFINITY is set to compact vs. scatter; system is assumed to consist of 4 cores with 4 hardware threads/core
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement