Cornell Virtual Workshop > Parallel Programming Concepts and High Performance Computing

Data Communication

Christopher J. Cameron, Steve Lantz
Cornell Center for Advanced Computing

Revisions: 3/2022, 9/2014 (original)

Distributed computing requires moving data between tasks. This topic introduces different strategies for messaging among tasks with a focus on conceptual differences and considerations so you can make choices that are appropriate for your application.

The examples in this sub-topic will refer to the MPI interface, but other message-passing libraries have equivalent functionality. The examples employ a data parallel programming style and assume that you have one or more arrays containing the input data. While we won't go into the detailed syntax of specific MPI functions, you will be equipped to interpret the documentation for the particular message-passing implementation available on your hardware.

Objectives

After you complete this segment, you should be able to:

Distinguish between blocking and non-blocking communication
Explain the difference between point-to-point and collective communication
Identify data dependency in a computation
Explain how unmanaged data dependency in a parallel program can produce unexpected outcomes

Prerequisites

This topic assumes a basic understanding of serial (single-threaded) programming and familiarity with computer terminology.

You don't need to know how to write code to understand this topic, but you will get more out of this topic if you have a specific computation in mind. If you have coded a serial example of your problem, you can use the techniques described in the CVW Profiling and Debugging topic to gather profiling information. Profiling will reveal the time-consuming parts of your program that should be the target of your parallelization efforts.