• Within a code, Scheduling Affinity and Memory Policy can be examined and changed via APIs to Linux system routines
  • These APIs let you set affinities and policies that differ per thread
  • Calling APIs from source code is not the only way to influence OpenMP threads:
    • One alternative is to set OpenMP environment variables that affect thread placement, OMP_PROC_BIND and OMP_PLACES
    • For codes compiled with Intel compilers, OpenMP threads are similarly influenced by the KMP_AFFINITY environment variable
    • Effects of these environment variables on OpenMP codes are described later
  • To modify scheduling affinity via an API: sched_getaffinity, sched_setaffinity
    • #define _GNU_SOURCE
    • #include <sched.h>
    • Automatically linked from libc
    • Can confirm placement with sched_getcpu
  • To modify memory policy via an API: get_memorypolicy, set_memorypolicy
    • #include <numaif.h>
    • Link with -lnuma
  • To make scheduling assignments, set bits in a mask:
Bit masks for sched_getaffinity
Bit masks for sched_getaffinity
Complete code example for Scheduling Affinity:

Quick exercise:

  1. Copy and compile the above code (icc -qopenmp schedaff.c -o schedaff).
  2. Set the number of threads with the OMP_NUM_THREADS environment variable and run the code.
  3. Confirm that each thread is assigned to the core with the matching id.
  4. Add an OpenMP function call nt = omp_get_num_threads() to the parallel section, then use private variable nt to assign threads to cores in reverse order (nt-1-inum).
Tip: Intel offers its own API for setting thread affinity

Intel's API for setting thread affinity is described in the Low Level Affinity API subsection in the Intel C++ Compiler Classic Developer Guide and Reference. The Intel API appears similar to Linux system calls, except all routines have kmp_ at the beginning of their names.

Finally, control of memory policy in presence of different kinds of memory—for example, DRAM and MCDRAM—may warrant the use of the memkind library, originally developed by Intel.

  • The memkind library is likely most useful in cases where compute nodes have special high-bandwidth memory, but any NUMA architecture may benefit.
  • Manual mode of operation: call hbw_malloc() instead of malloc() when wanting to allocate high-bandwidth memory. Calls to other functions in the hbwmalloc collection may also be made. Be sure to #include <hbwmalloc.h>.
  • Automatic mode of operation: use the autohbw library and related environment variables to specify the threshold for allocating high-bandwidth memory.
  • More advanced cases can be covered as well. Information on the fully general API, for multiple types of memory, may be found in the memkind description.
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement