System Components
What does it take to build a Leadership-Class Computing Facility? What exactly goes into a system designed to meet the biggest challenges of computational researchers? Since more and more cycles are being delivered by GPUs, especially for AI applications, it seems that the ideal building block would be an energy-efficient platform that allows CPUs and GPUs to share their memory seamlessly. Vista is intended to prove that concept.
Vista at TACC is made up of 256 NVIDIA "Grace Grace" (GG) compute nodes, 600 NVIDIA "Grace Hopper" (GH) compute nodes, and several GG login nodes. The names of the node types derive from the NVIDIA "superchips" that power them. "Grace Grace" consists of a pair of CPUs, while "Grace Hopper" consists of one "Grace" CPU paired with one "Hopper" GPU. Networking between the nodes is handled by two tiers of NVIDIA NDR InfiniBand switches. Even though NVIDIA is mostly known for its GPUs, the Grace CPUs are an NVIDIA product as well; they are based on the ARM architecture, in contrast to the x86-based systems on Frontera.
The NVIDIA H200 GPU on the GH compute nodes boasts 96 GB of HBM3 memory and delivers about 34 teraflop/s of double precision (FP64) performance or 1,979 teraflop/s of half precision (FP16) tensor performance, making the GPU well suited for both AI and LLM inference. The H200’s HBM3 memory delivers up to 4 TB/s of bandwidth, and the NVIDIA NVLink interconnect provides up to 900 GB/s of bandwidth between the GPU and CPU. On the CPU side, Grace has 120 GB of LPDDR5X memory with up to 0.5 TB/s of bandwidth. Given these memory and interconnect properties, the GH200 Grace Hopper Superchip becomes a very effective implementation of NVIDIA's Unified Memory technology, in which memory transfers are managed automatically so that CPU and GPU threads can transparently access both CPU and GPU memory.
For a more complete discussion of the capabilities of NVIDIA GPUs, see the Understanding GPU Architecture roadmap.
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)