Single-Core Optimization
Brandon Barker, Steve Lantz
Cornell Center for Advanced Computing
Revisions: 8/2023, 2/2019, 7/2015, 10/2014 (original)
An application can gain significant performance benefits if the programmer pays attention to its behavior at the individual processor level. But performance at this low level is significantly influenced by how an algorithm is expressed in high-level language; and how the compiler converts the high-level program into a specific sequence of operations on data. Accordingly, this topic surveys the common features of computer architectures, to see what coding practices might help or hinder good performance.
Objectives
After you complete this segment, you should be able to:
- List factors that affect program performance at the individual processor level
- List the three basic principles of achieving good single-core performance
- Explain how a computer's memory hierarchy provides efficient access to data
- Describe a typical processor's cache memory hierarchy
- Define latency and bandwidth
- State approximate latencies and bandwidths for retrieving data from various memory levels
- Explain the main objective of vectorization
- Describe how pipelining works at the hardware level
- Explain why vectorization and pipelining are examples of "micro-parallelism"
Prerequisites
The Code Optimization roadmap assumes only that the reader has some basic familiarity with programming in any language. The HPC languages C and Fortran are used in examples. Necessary concepts are introduced as one progresses through the roadmap.
In parallel programming, the key consideration for optimizing large-scale parallel performance is the scalability of a code's algorithm(s). Therefore, readers who are developing parallel applications may want to peruse the roadmap for Scalability first.
More advanced references on performance optimization include the Virtual Workshop roadmaps on Profiling and Debugging and Vectorization. For those interested in programming for advanced HPC architectures, such as TACC's clusters built with Intel processors, the roadmap Case Study: Profiling and Optimization on Advanced Cluster Architectures is relevant. The present roadmap should make a good starting point for diving into any of those.