In-Depth Tuning
In-depth tuning is the process that HPC developers try to apply to key pieces of software, such as performance libraries or codes that are widely used by research communities. Logically, it comes last in our sequence of development stages for a high-performance application.
The general idea of performance tuning is to instrument the code with timing calls to discover its hot spots—where most of the computing time is spent—and then to devote special attention to tweaking those hot spots until they run faster. There are quite a few run-time tools that have been created to assist in this process; they can be as simple as gprof or as complex as Intel VTune. Often the diagnostic tool will reveal a performance problem that is traceable to one of the topics covered in this topic, such as excessive cache misses.
In any event, in-depth tuning is a long, iterative process that is briefly summarized as follows:
- Profile code
- Work on most time intensive blocks
- Repeat as long as the benefits are worth your while...
Like any simple algorithm, it can be illustrated in a flowchart:
This process is definitely not a good starting point in your code development. The great computer scientist Donald Knuth is credited as having said, "Premature optimization is the root of all evil." In this topic, we offer you a corollary: don't hand-tune anything until you find out if someone else has already done it for you, via either their performance library or their optimizing compiler. Remember, human labor is the most precious resource of all! The computer is supposed to be doing work for you, not the other way around.