Debugging distributed applications (e.g. MPI processes) raises additional challenges compared to a standalone single or multithreaded application. Aside from the obvious challenge of dealing with N independent processes, each with their own independent address space, the nature of the communication between processes becomes a potential source of error that itself may need to be debugged.

A distributed debugger may load alternate libraries (e.g. MPI) so that it can capture information about process activity.
A distributed debugger may load alternate libraries (e.g. MPI) so that it can capture information about process activity.

To deal with the challenges of distributed debugging, one typically has to use a debugging application specifically designed for such situations. In general, a distributed debugging application wraps each individual process in a single-process debugger, which communicates with a central application. Communications between processes are potentially intercepted and filtered by the debugging application at the library level. This allows the debugging application to simply log communications between processes, or to actively modify them in some way, such as by delaying messages or by stepping through messages one by one.

Debuggers attached to each distributed process communicate to the central debugger application.
Debuggers attached to each distributed process communicate to the central debugger application.
 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement