Cornell Virtual Workshop > Code Optimization

Cache Considerations

Brandon Barker, Steve Lantz
Cornell Center for Advanced Computing

Revisions: 8/2023, 2/2019, 7/2015, 10/2014 (original)

The way you organize your source code and its data structures can affect how quickly the data needed for computations will arrive at various cache levels and persist there. This topic explores the inner workings of caches and the implications of this for code structure.

Objectives

After you complete this segment, you should be able to:

Define the term latency hiding
Explain the purpose of prefetching
Describe how contention affects performance in multithreaded applications
Describe how cache coherence is maintained in Intel Scalable Processors
Explain what is meant by false sharing of cache lines
Rewrite a set of nested loops to implement array blocking
Explain the importance of cache awareness when writing code
Distinguish between fully associative, N-way associative, and direct mapped caches
Describe how the processor's hardware fetches items from the main memory into the L1 data cache
Define the term congruence class
Explain why cache thrashing occurs and how it can be avoided

Prerequisites

The Code Optimization roadmap assumes only that the reader has some basic familiarity with programming in any language. The HPC languages C and Fortran are used in examples. Necessary concepts are introduced as one progresses through the roadmap.

In parallel programming, the key consideration for optimizing large-scale parallel performance is the scalability of a code's algorithm(s). Therefore, readers who are developing parallel applications may want to peruse the roadmap for Scalability first.

More advanced references on performance optimization include the Virtual Workshop roadmaps on Profiling and Debugging and Vectorization. For those interested in programming for advanced HPC architectures, such as TACC's clusters built with Intel processors, the roadmap Case Study: Profiling and Optimization on Advanced Cluster Architectures is relevant. The present roadmap should make a good starting point for diving into any of those.