Superficially, the block diagram of the Turing TU104 chip (below) has a hierarchy very similar to that of the GV100. The 48 SMs are paired up into 24 Texture Processing Clusters, which are divided into 6 GPU Processing Clusters.

Turing TU104 block diagram, smaller size, as described in the main text
Turing TU104 block diagram, larger size, as described in the main text
Turing TU104 block diagram.

But the TU104 features a number of elements that are not found at all in a GV100, because each cluster or level in the hierarchy is home to a special type of processing unit that accomplishes a certain step in the graphics pipeline. The table below shows the special units that are associated with different blocks in the TU104 block diagram:

The association between the blocks (or cluster levels) in the block diagram and the special units found in an RTX 5000.
Block Name Associated Special Unit Count in TU104
SM Ray Tracing Core 48
TPC PolyMorph Engine 24
GPC Raster Engine 6

In addition, each Memory Controller has associated with it a set of 8 Render Output units (or ROPs, displayed near the L2 cache in the above diagram). Since there are 8 memory controllers, in all, the TU104 has a total of 64 ROPs.

For applications that just want to use the GPU to crunch numbers, these special graphics features have no particular importance, other than as an aid in understanding why the hierarchical arrangement of the SMs exists in the first place.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement