How numactl command arguments impact processes

Consider the effects of the following command on a Stampede3 Skylake node with 48 cores:

numactl <option socket(s)/core(s)> ./a.out

Affinity to socket, allocation to socket, affinity to core
For a Process:
Socket Control
For a Process's Memory:
Socket Control
For a Process:
Core Control
assignment
to socket(s)
-N
pattern of memory allocation
(local, interleaved, pref., mandatory)
-l, -i, --preferred, -m
assignment
to core(s)
-C
Quick guide to numactl
Flag Arguments Action
Socket Affinity -N {0,1} Execute process on cores of this (these) socket(s) only.
Memory Policy -l no argument Allocate on current socket; fallback to any other if full.
Memory Policy -i {0,1} Allocate round robin (interleave) on these sockets. No fallback.
Memory Policy --preferred= {0,1} Allocate on this socket; fallback to any other if full.
Memory Policy -m {0,1} Allocate only on this (these) socket(s). No fallback.
Core Affinity -C {0,1,2,3,...47} Execute process on this (these) core(s) only.
NUMA Information --hardware no argument Display detailed info about NUMA nodes and internode distances.
Notes on terminology

So far, the discussion of NUMA has involved familiar hardware objects like sockets (or equivalently, processors) and cores. However, these terms can be generalized into broader concepts, namely, NUMA nodes and hardware threads. In typical x86-based processors like Intel Xeons, one often assumes that NUMA nodes are the same as sockets, and hardware threads are the same as cores. But for x86 processors built from multi-chip modules (e.g., Intel's "EMIB" technology), or for Xeons with hyperthreading enabled, these assumptions can break down.

In the more general conception of hardware that numactl works with, NUMA nodes are the predesignated pools of memory that are deemed to be "closer" to one group of hardware threads than to another group. Hardware threads may also be referred to as "logical CPUs" or "virtual CPUs" (vCPUs). Further details on numactl use can be found by typing man numactl on the system you are using. A summary of the NUMA nodes available on a given machine, as well as other CPU-related info, can be obtained by running lscpu.

Specific considerations for memory usage and NUMA node configurations on TACC systems can be found in the respective user guides at the TACC documentation hub. More background on NUMA, specifically in NUMA in Linux, can be found in this ACM Queue article.

Historical note: in the past, Stampede2 used to offer multiple extra batch queues to provide optional NUMA-node configurations for its KNL nodes. These Xeon Phi processors came with special onboard memory called MCDRAM. To identify the NUMA node that corresponded to the high-bandwidth MCDRAM memory, one could issue a helpful command from the memkind library, memkind-hbw-nodes.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement