### NVIDIA Quadro RTX 5000

TACC's Frontera also offers a subsystem equipped with NVIDIA Quadro RTX 5000 graphics cards. These devices are based on NVIDIA's Turing microarchitecture, which is the next generation after Volta. Even so, for HPC purposes, the Quadros are individually less capable than the Tesla V100s in Frontera's former Longhorn subsystem.

If one looks in detail at the TU104 chip that lies at the heart of a Quadro RTX 5000, one sees that its Streaming Multiprocessors (SMs) are missing the CUDA cores for processing FP64 data. And even though a single Turing SM has the exact same number of FP32 and INT32 CUDA cores as a single Volta SM (64 for each datatype), the Quadro RTX 5000 has far fewer SMs overall: just 48 in total, as compared to the 80 SMs in the Tesla V100.

Accordingly, the peak FP32 performance of the Quadro is significantly below the 15.7 teraflop/s of the Tesla V100, despite the Quadro's faster Boost Clock:

\[3072 \text{ CUDA cores } \times 2{\frac{\text{flop}}{\text{core/cycle}}} \times 1.815 {\frac{\text{Gcycle}}{\text{s}}} \approx 11.2 {\frac{\text{Tflop}}{\text{s}}}\]The extra factor of 2 is included to account for the possibility of a fused multiply-add (FMA) on every cycle.

The TU104 also has a way to perform FMAs on FP64 data, but since it lacks dedicated CUDA cores for the purpose, it can only do so at the drastically reduced rate of 2 results/cycle/SM. You may refer to NVIDIA's CUDA Programming Guide for a full rundown of the throughput for different arithmetic instructions, based on the the compute capability of the device. (The TU104 has a compute capability of 7.5: footnote 5 of Table 3 is where to find the FMA rate for GPUs of that rating.)

The above calculation of FP32 peak performance does not mean that the Quadro RTX 5000 is inferior to the Tesla V100; it just means that the Quadro was primarily designed as a high-end graphics card. One could consider other kinds of metrics, too, such as flop/s/watt and cost/(flop/s), which might make the RTX 5000 an attractive component of HPC systems.