Data Locality Matters

Brandon Barker and Steve Lantz
Cornell Center for Advanced Computing

Revisions: 8/2023, 2/2019, 7/2015, 10/2014 (original)

Data locality is often the single most important factor in improving per-core performance. This topic explains the advantages of doing all the work that can be on well-aligned sequences of data while they are present in cache, or even in registers, so that deeper levels of the memory hierarchy are accessed infrequently.

Objectives

After you complete this segment, you should be able to:

  • Explain the concept of data locality and its significance
  • List the three principles of good data locality
  • Name the performance advantages of accessing data with unit stride
  • Demonstrate the preferred ordering of nested loops when accessing row- and column-major (C and Fortran) arrays
  • Define the term hoisting
  • Give an example of what might constrain a compiler from moving invariant code out of a loop
Prerequisites

The Code Optimization roadmap assumes only that the reader has some basic familiarity with programming in any language. The HPC languages C and Fortran are used in examples. Necessary concepts are introduced as one progresses through the roadmap.

In parallel programming, the key consideration for optimizing large-scale parallel performance is the scalability of a code's algorithm(s). Therefore, readers who are developing parallel applications may want to peruse the roadmap for Scalability first.

More advanced references on performance optimization include the Virtual Workshop roadmaps on Profiling and Debugging and Vectorization. For those interested in programming for advanced HPC architectures, such as TACC's clusters built with Intel processors, the roadmap Case Study: Profiling and Optimization on Advanced Cluster Architectures is relevant. The present roadmap should make a good starting point for diving into any of those.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement