Baseline Scaling
A baseline scaling study indicates how well the code performs when distributed across various numbers of MPI ranks, either on a single node (intra-node scaling) or multiple nodes (inter-node scaling). Background information for this segment can be found in the Scalability and MPI topics of the Cornell Virtual Workshop.
Key points:
- Intra-node scaling study compares performance on a single KNL node (with 68 cores @ 1.4 GHz, and 2 x 512-bit vector registers) versus that on a single Sandy Bridge Xeon node (with 16 cores @ 2.6 GHz, and 2 x 256-bit vector registers).
- As expected, scaling with increasing number of MPI ranks plateaus when that number exceeds the number of available cores; and for small number of ranks, the code runs faster on the Sandy Bridge Xeon at 2.6 GHz than the KNL at 1.4 GHz, although the KNL clock-speed deficit is offset slightly by the longer vector registers.
- The substantially larger number of available cores on KNL allows for overall better performance with a larger number of MPI ranks.
- Across multiple nodes, maximum performance is achieved with 128 total ranks, distributed either as 4 nodes x 32 ranks per node or 8 nodes x 16 ranks per node. Scaling efficiency degrades beyond 8 nodes. This is perhaps due to either MPI communication overhead, or core/thread resource contention.
©
|
Cornell University
|
Center for Advanced Computing
|
Copyright Statement
|
Access Statement
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)
CVW material development is supported by NSF OAC awards 1854828, 2321040, 2323116 (UT Austin) and 2005506 (Indiana University)