Front view of the tall, black racks of compute nodes in the former Longhorn subsystem of TACC Frontera. Each node is marked by a row of 4 green lights, possibly indicating the status of the 4 Tesla V100s hosted by the node.
Tesla V100 servers in the former Longhorn subsystem of Frontera.

At inception, the leadership-class Frontera system at the Texas Advanced Computing Center included two GPU subsystems. The one shown in the top figure, called "Longhorn", was suitable for double-precision work. Prior to its decommissioning in 2022, it had over 400 NVIDIA Tesla V100 GPU accelerators hosted in 100+ IBM POWER9-based AC922 servers, with 4 GPUs per server. A full account of the properties of the Tesla V100 is found in a prior topic of the Understanding GPU Architecture roadmap. The remaining subsystem, which can be accessed via special queues on Frontera, consists of 360 NVIDIA Quadro RTX 5000 graphics cards hosted in Dell/Intel Broadwell-based servers, again featuring 4 GPUs per server. Frontera's original pair of GPU subsystems combined to contribute 11 petaflop/s of single precision computing power to Frontera, serving to accelerate artificial intelligence, machine learning, and molecular dynamics research.

View down into a bank of liquid-immersed servers, each of which hosts 4 NVIDIA Quadro RTX 5000 GPUs.
Quadro RTX 5000 servers at TACC with liquid cooling by GRC.

Interestingly, due to the very high concentrations of heat-generating computing power, Frontera's design includes special features to cool its components so they can run at top speed. Nearly all its racks and servers are water cooled, since standard air cooling with fans would be insufficient. The NVIDIA Quadros in particular are cooled in a very unusual way: as shown in the second figure, they are completely submerged in baths of liquid coolant, a solution developed by GRC. (The V100s in the former Longhorn subsystem happened to be air-cooled. However, if Longhorn had possessed 6 V100s per node instead of 4, then the water-cooled variant of the IBM AC922 servers would have been required.)

In the pages to come, we'll be taking a deep dive into the RTX 5000, to see what makes it attractive for doing GPGPU.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement