The basic way the memory is laid out in Stampede3 at TACC follows a typical pattern for HPC clusters, which may be summarized and illustrated as follows:

  • The cluster memory is separated between nodes, so it is distributed memory
    • Memory is local to each node and is not directly addressable from other nodes
    • To share data, message passing over a network is required
    • This is symbolized by the arrow that connects two nodes in the figure below
  • Within a single node, memory spans all cores on a node, so it is shared memory
    • A node's full local memory is addressable from any core in the node
    • This is indicated by the black border that surrounds each 48-core Skylake ("SKX") node in the figure
A pair of Stampede3 Skylake nodes, where in each node, the RAM is shared by all the cores in the node, though it is split between 2 sockets. Sockets are attached to each other internally, and the nodes are connected to each other externally through the Omni-Path network, symbolized by the arrow between the two nodes.
A pair of Stampede3 Skylake nodes, where in each node, the RAM is shared by all the cores in the node, though it is split between 2 sockets. Sockets are attached to each other internally, and the nodes are connected to each other externally through the Omni-Path network, symbolized by the arrow between the two nodes.

Even within the nodes, further architectural details result in non-uniform memory access (NUMA). Again taking Stampede3 as an example:

  • Typically two sockets per node
    • Each node has two sockets, to hold two Intel Xeon "Skylake" processors (e.g.)
    • Sockets are indicated by the orange outlines in the above figure
  • Multiple cores per socket
    • Each Skylake socket (processor) has 24 cores
    • These are depicted as small blue boxes in the figure
  • Memory is attached to sockets
    • Cores sharing the same socket have fastest access to its attached memory
    • In the figure, memory modules and channels are illustrated by thin black lines

For simplicity, most of the diagrams on the ensuing pages show a 2-socket Xeon Skylake node as a model, making the figures easier to follow (which wouldn't be the case if we drew all the cores of some other processors!). But for completeness, here is a rundown of all the types of nodes currently found in Stampede3 and Frontera.

Stampede3 is a very large HPC cluster composed of 1,864 compute nodes, of multiple types (which have some notable differences):

  • SKX - 1,060 dual-processor Intel Xeon Platinum 8160 "Skylake" nodes, each having 48 total cores and 192GB DDR4 RAM
  • ICX - 224 dual-processor Intel Xeon Platinum 8380 "Ice Lake" nodes, each with 80 total cores and 256GB DDR4 RAM
  • SPR - 560 dual-processor Intel Xeon Max 9480 "Sapphire Rapids HBM" nodes, each having 112 total cores and 128GB HBM
  • PVC - 20 quad-GPU "Ponte Vecchio" nodes, each having 4x Intel Data Center GPU Max 1550s, hosted by 2x Intel Xeon Platinum 8480+ "Sapphire Rapids" processors with 112 total cores and 512GB DDR5 RAM

Frontera is a much larger HPC cluster composed of 8,474 compute nodes, again of a few different types:

  • CLX - 8,368 dual-processor Intel Xeon Platinum 8280 "Cascade Lake" nodes, each having 56 total cores and 192GB DDR4 RAM
  • NVDIMM - 16 large-memory, quad-processor Intel Xeon Platinum 8280M "Cascade Lake" nodes, each with 112 total cores and 2.1 TB Intel Optane NVDIMM memory
  • RTX - 90 GPU nodes, each featuring 4 NVIDIA Quadro RTX 5000 GPUs, along with 16 total Intel "Broadwell" CPU cores and 128GB DDR4 RAM
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement