Cornell Virtual Workshop > Hybrid Programming with OpenMP and MPI > Creating Hybrid Configurations

NUMA Control in Code

Within a code, Scheduling Affinity and Memory Policy can be examined and changed via APIs to Linux system routines
These APIs let you set affinities and policies that differ per thread
Calling APIs from source code is not the only way to influence OpenMP threads:
- One alternative is to set OpenMP environment variables that affect thread placement, OMP_PROC_BIND and OMP_PLACES
- For codes compiled with Intel compilers, OpenMP threads are similarly influenced by the KMP_AFFINITY environment variable
- Effects of these environment variables on OpenMP codes are described later
To modify scheduling affinity via an API: sched_getaffinity, sched_setaffinity
- #define _GNU_SOURCE
- #include <sched.h>
- Automatically linked from libc
- Can confirm placement with sched_getcpu
To modify memory policy via an API: get_memorypolicy, set_memorypolicy
- #include <numaif.h>
- Link with -lnuma
To make scheduling assignments, set bits in a mask:

Complete code example for Scheduling Affinity:

#define _GNU_SOURCE
#include <sched.h>                          //C API parameters and prototypes
#include <omp.h>
#include <stdio.h>
int main(int argc, char **argv) {
  int inum, err, cpu;
  cpu_set_t cpu_mask;                             //Allocate mask
#pragma omp parallel private(inum, cpu_mask, err, cpu)
  {
    inum = omp_get_thread_num();
    CPU_ZERO(     &cpu_mask);                 //Set mask to zero
    CPU_SET(inum, &cpu_mask);                 //Set mask with thread #
    err = sched_setaffinity( (pid_t)0,            //Set the affinity
                             sizeof(cpu_mask),
                             &cpu_mask );
    cpu = sched_getcpu();                         //Confirm affinity
    printf("thread %d on cpu %d\n", inum, cpu);
  }
}

Quick exercise:

Copy and compile the above code (icc -qopenmp schedaff.c -o schedaff).
Set the number of threads with the OMP_NUM_THREADS environment variable and run the code.
Confirm that each thread is assigned to the core with the matching id.
Add an OpenMP function call nt = omp_get_num_threads() to the parallel section, then use private variable nt to assign threads to cores in reverse order (nt-1-inum).

Tip: Intel offers its own API for setting thread affinity

Intel's API for setting thread affinity is described in the Low Level Affinity API subsection in the Intel C++ Compiler Classic Developer Guide and Reference. The Intel API appears similar to Linux system calls, except all routines have kmp_ at the beginning of their names.

Finally, control of memory policy in presence of different kinds of memory—for example, DRAM and MCDRAM—may warrant the use of the memkind library, originally developed by Intel.

The memkind library is likely most useful in cases where compute nodes have special high-bandwidth memory, but any NUMA architecture may benefit.
Manual mode of operation: call hbw_malloc() instead of malloc() when wanting to allocate high-bandwidth memory. Calls to other functions in the hbwmalloc collection may also be made. Be sure to #include <hbwmalloc.h>.
Automatic mode of operation: use the autohbw library and related environment variables to specify the threshold for allocating high-bandwidth memory.
More advanced cases can be covered as well. Information on the fully general API, for multiple types of memory, may be found in the memkind description.

Back