OpenMP Affinity Variables
With OpenMP, it is relatively easy to make the thread affinity follow some overall pattern. You do this by setting the values of certain environment variables. The following variables have an effect on any OpenMP code:
- OMP_PROC_BINDspecifies how OpenMP threads are distributed across some set of "places"
- OMP_PLACESsays how those places relate to the available hardware
A similar environment variable acts only on OpenMP codes that are compiled with Intel compilers:
- KMP_AFFINITYcontrols both the distribution and granularity of thread placement
General OpenMP Environment Variables
    Considering first the more general OpenMP variables, let's say we want to bind threads to cores, and we want to do it in a way that spreads out the threads evenly among the available cores. If we set OMP_PROC_BIND=spread and OMP_PLACES=cores, as shown below, then we achieve the desired binding.
export OMP_PROC_BIND=spread
export OMP_PLACES=cores
export OMP_NUM_THREADS=48
./my_openmp_code
    Acceptable values that may be assigned to OMP_PLACES include the abstract names sockets, cores, and threads, where "threads" in this context means "hardware threads", i.e., execution slots that are enabled through hyperthreading. It is also allowable to assign a set of numbers to OMP_PLACES, instead of an abstract name, where the numbers correspond to individual CPUs (threads or cores), as described previously in the page on using numactl.
    The following table summarizes the options that are available through the use of the OMP_PROC_BIND variable, assuming OMP_PLACES is set equal to cores.
| OMP_PROC_BIND | Pattern for OMP_PLACES=cores | 
|---|---|
| spread | Binds threads to cores as sparsely as possible, according to core number. If threads exceed cores, then consecutive thread numbers are grouped together. | 
| close | Packs consecutive threads as close as possible to each other. Multiple consecutive threads are packed on each core if necessary. | 
| master | Binds threads to the same place as the master thread (probably this is most suitable when the places are sockets). | 
| false | Completely disables any affinity settings. | 
    If an OpenMP parallel section in the code contains a proc_bind clause, then the affinity specified by that clause takes precedence over the above settings. Additional complexities may come into play for nested parallel sections; see the OpenMP documentation for details. Additional discussion of how to configure affinities in OpenMP can be found in the relevant section of Victor Eijkhout's Parallel Programming for Science and Engineering.
Intel Compiler Environment Variables
    While the environment variables described above are effective on any OpenMP code, the Intel compilers provide a very similar mechanism for controlling thread affinity through the KMP_AFFINITY environment variable. The possible type values of KMP_AFFINITY and their effects are presented in the next table; it is assumed that granularity=core is also specified.
| KMP_AFFINITY | Pattern for KMP_AFFINITY=granularity=core,... | 
|---|---|
| scatter | Does a round-robin assignment of threads to cores. | 
| balanced | Acts like scatter, but keeps consecutive thread numbers together, if threads exceed cores. (Only supported in single-socket configurations.) | 
| compact | Packs consecutive threads as close as possible to each other. | 
| explicit | Binds threads to the core numbers listed in the proclist=modifier. | 
| none | No thread affinity is set. OpenMP variables may still determine affinity. | 
| disabled | Completely disables any affinity settings. | 
    The code block below displays commented-out pairs of OpenMP variables, followed by the equivalent setting of KMP_AFFINITY.
# export OMP_PLACES=cores
# export OMP_PROC_BIND=close
export KMP_AFFINITY=granularity=core,compact
# export OMP_PLACES=cores
# export OMP_PROC_BIND=spread
export KMP_AFFINITY=granularity=core,balanced
    The KMP_AFFINITY=balanced type seems to work as expected on TACC's dual-socket systems, even though according to Intel documentation, "this affinity type is supported on the CPU only for single socket systems". In any event, KMP_AFFINITY=scatter would have nearly the same effect and is likely to be sufficient.
To give another example:
- Imagine a system with 4 cores and 4 hardware threads/core
- See below for the placement of 8 threads when KMP_AFFINITY=granularity=fine,<type>
- As shown, the scattertype will fully utilize all cores, but thecompacttype may not
- Recommendation: use compactonly in special cases, or perhaps withgranularity=core
 
            CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)