Profiling a parallel program begins with the same steps: compiling the program with options to augment it with profiling instructions, then running the resulting binary to generate the output data for gprof to analyze. For parallel runs, since a separate copy of the binary is executed to represent each task, we'd like to aggregate the data for all tasks to produce profiling data for the entire parallel program. gprof supports this through its -s ("sum") option.

To use gprof -s, we must first ensure that each task will produce a unique call graph output, especially given that they will likely execute on a common filesystem. gprof supports this by allowing you to parameterize the output file based on the individual task's Unix process ID. To enable this, set the environment variable GMON_OUT_PREFIX. For example,

will cause the output files to be named gout.<pid>. When the program completes, gprof can aggregate the data in these files together using

The aggregate output file is named gmon.sum by default. This file can be analyzed by gprof using

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement