On this page and the next we focus on compiler options that are especially useful for code optimization. These options exist so you can let the compiler know how much you want it to perform sophisticated "back-end" transformations of the machine-level code in order to make it run faster.

A few compiler options have become standard and will work on most platforms. They are the indispensable "-O" (capital O) flags. The flags are really a kind of shorthand; they conveniently combine a number of specific techniques into a single option so you don't have to memorize a group of 20 separate flags (though you can if you want to). The standard levels are:

  • -O0 for fast compilation without optimization.
  • -O1 for limited optimization that does not increase code size; no inlining (more about this later).
  • -O2 for moderate optimization including vectorization (with the Intel classic compilers). Debugging support is retained if -g is also specified.
  • -O3 for aggressive optimization, resulting in longer compile time, more trading of space for speed, and possibly marginal effectiveness. This may change code semantics and, occasionally, results. It also enables the optimizations present in -O2. (GCC compilers start vectorizing at this level.)

The default (and often preferred) level is -O2. This moderate optimization level enables things like:

  • Instruction scheduling - rearranging instructions to avoid stalls due to lack of data
  • Copy propagation - replacing variables in expressions with their current values
  • Software pipelining - executing several stages of a loop simultaneously, as in a round of music
  • Common subexpression elimination - finding identical expressions, calculating them only once
  • Prefetching - explicitly requesting data before it is needed, so it is ready ahead of schedule
  • Loop transformations - fission, fusion, interchange, reversal, tiling, unrolling, hoisting of code outside, etc.

The more aggressive -O3 optimization level does more prefetching and more daring kinds of loop transformations. However, the default -O2 can do surprisingly smart things. Consider the code we used earlier to test the effect of loop strides, stride.c. Try making one simple change: remove variable mean from the printf statement. If you compile and run the modified code, you will find it runs amazingly faster, with no dependence on stride whatsoever. This is because the compiler has actually noticed that mean is never used and has eliminated its computation entirely!

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement