Profiling Parallel Programs
Profiling a parallel program begins with the same steps: compiling the program with options to augment it with profiling
instructions, then running the resulting binary to generate the output data for gprof
to analyze. For parallel
runs, since a separate copy of the binary is executed to represent each task,
we'd like to aggregate the data for all tasks to produce profiling data for the
entire parallel program. gprof
supports this through its -s
("sum") option.
To use gprof -s
, we must first ensure that each task will produce a unique call
graph output, especially given that they will likely execute on a common
filesystem. gprof
supports this by allowing you to parameterize the output
file based on the individual task's Unix process ID. To enable this, set the
environment variable GMON_OUT_PREFIX
. For example,
will cause the output files to be named gout.<pid>
. When the program completes, gprof
can aggregate the data in
these files together using
The aggregate output file is named gmon.sum
by default. This file can be analyzed by gprof
using