Cache Considerations
Brandon Barker, Steve Lantz
Cornell Center for Advanced Computing
Revisions: 8/2023, 2/2019, 7/2015, 10/2014 (original)
The way you organize your source code and its data structures can affect how quickly the data needed for computations will arrive at various cache levels and persist there. This topic explores the inner workings of caches and the implications of this for code structure.
Objectives
After you complete this segment, you should be able to:
- Define the term latency hiding
- Explain the purpose of prefetching
- Describe how contention affects performance in multithreaded applications
- Describe how cache coherence is maintained in Intel Scalable Processors
- Explain what is meant by false sharing of cache lines
- Rewrite a set of nested loops to implement array blocking
- Explain the importance of cache awareness when writing code
- Distinguish between fully associative, N-way associative, and direct mapped caches
- Describe how the processor's hardware fetches items from the main memory into the L1 data cache
- Define the term congruence class
- Explain why cache thrashing occurs and how it can be avoided
Prerequisites
The Code Optimization roadmap assumes only that the reader has some basic familiarity with programming in any language. The HPC languages C and Fortran are used in examples. Necessary concepts are introduced as one progresses through the roadmap.
In parallel programming, the key consideration for optimizing large-scale parallel performance is the scalability of a code's algorithm(s). Therefore, readers who are developing parallel applications may want to peruse the roadmap for Scalability first.
More advanced references on performance optimization include the Virtual Workshop roadmaps on Profiling and Debugging and Vectorization. For those interested in programming for advanced HPC architectures, such as TACC's clusters built with Intel processors, the roadmap Case Study: Profiling and Optimization on Advanced Cluster Architectures is relevant. The present roadmap should make a good starting point for diving into any of those.