RAM Arrangement in HPC Clusters
The basic way the memory is laid out in Stampede3 at TACC follows a typical pattern for HPC clusters, which may be summarized and illustrated as follows:
-
The cluster memory is separated between nodes, so it is distributed memory
- Memory is local to each node and is not directly addressable from other nodes
- To share data, message passing over a network is required
- This is symbolized by the arrow that connects two nodes in the figure below
-
Within a single node, memory spans all cores on a node, so it is
shared memory
- A node's full local memory is addressable from any core in the node
- This is indicated by the black border that surrounds each 48-core Skylake ("SKX") node in the figure
Even within the nodes, further architectural details result in non-uniform memory access (NUMA). Again taking Stampede3 as an example:
-
Typically two sockets per node
- Each node has two sockets, to hold two Intel Xeon "Skylake" processors (e.g.)
- Sockets are indicated by the orange outlines in the above figure
-
Multiple cores per socket
- Each Skylake socket (processor) has 24 cores
- These are depicted as small blue boxes in the figure
-
Memory is attached to sockets
- Cores sharing the same socket have fastest access to its attached memory
- In the figure, memory modules and channels are illustrated by thin black lines
For simplicity, most of the diagrams on the ensuing pages show a 2-socket Xeon Skylake node as a model, making the figures easier to follow (which wouldn't be the case if we drew all the cores of some other processors!). But for completeness, here is a rundown of all the types of nodes currently found in Stampede3 and Frontera.
Stampede3 is a very large HPC cluster composed of 1,864 compute nodes, of multiple types (which have some notable differences):
- SKX - 1,060 dual-processor Intel Xeon Platinum 8160 "Skylake" nodes, each having 48 total cores and 192GB DDR4 RAM
- ICX - 224 dual-processor Intel Xeon Platinum 8380 "Ice Lake" nodes, each with 80 total cores and 256GB DDR4 RAM
- SPR - 560 dual-processor Intel Xeon Max 9480 "Sapphire Rapids HBM" nodes, each having 112 total cores and 128GB HBM
- PVC - 20 quad-GPU "Ponte Vecchio" nodes, each having 4x Intel Data Center GPU Max 1550s, hosted by 2x Intel Xeon Platinum 8480+ "Sapphire Rapids" processors with 112 total cores and 512GB DDR5 RAM
Frontera is a much larger HPC cluster composed of 8,474 compute nodes, again of a few different types:
- CLX - 8,368 dual-processor Intel Xeon Platinum 8280 "Cascade Lake" nodes, each having 56 total cores and 192GB DDR4 RAM
- NVDIMM - 16 large-memory, quad-processor Intel Xeon Platinum 8280M "Cascade Lake" nodes, each with 112 total cores and 2.1 TB Intel Optane NVDIMM memory
- RTX - 90 GPU nodes, each featuring 4 NVIDIA Quadro RTX 5000 GPUs, along with 16 total Intel "Broadwell" CPU cores and 128GB DDR4 RAM