Where to Use OpenMP

OpenMP was conceived as an API for expressing shared-memory parallelism. It is, therefore, primarily useful for applications that run on a single node of a large multicore machine. In this situation, there are several possible parallelization strategies:

  • A single executable with a unified virtual memory that uses OpenMP for parallelism.
  • Multiple executables, each have their own virtual memory, communicate using MPI.
  • A combination of these approaches, called hybrid programming, which is discussed in the Hybrid Programming topic. In hybrid programming, OpenMP is used for the portions of a multinode applications that are performed on a single node.
Conceptual representation of the three parallelization strategies described in the caption.
In the single executable strategy (shown on the left), all the cores on a node are used by OpenMP threads with access to the same virtual memory. In the pure MPI strategy, each core on the node runs an independent task with independent virtual memory. In the hybrid strategy (middle), the cores are divided into groups with one MPI task and several OpenMP threads per group. Memory is shared within each group.

Assuming that you intend to use OpenMP in your application, where should you insert the directives to begin turning your serial code into parallel code? Ideally, this has been part of your plan from the beginning, and your code was designed with parallelization in mind. But if not, it is critical to have an understanding of where your application spends its time when working on a typical problem. Using profiling to determine the "hotspots" where the most time is spent is an important step in the process when developing parallel applications. In some cases, there is a single loop buried far inside the program where more than 95% of the time is spent. If that is your situation, then it's that loop that you should consider putting into a parallel construct, if it is eligible for OpenMP parallelization.

Which raises the question, what makes a code section eligible for OpenMP? In order for loops or program sections to be divided up and computed concurrently in separate threads, there must be few or no data dependencies between them. One common HPC application involves computing time series for a problem. There are generally many dependencies between the values of variables at each successive step and the values they had at the preceding step, so it is impractical to consider using OpenMP to compute multiple steps concurrently. However, the work involved in calculating values for the next step generally involves only variables from steps that have already been calculated, so that work can be parallelized. It is important to be aware of what the needs are for your particular case.

Other considerations arise from an understanding of parallel programming concepts, which are introduced in the Parallel Programming Concepts topic. Because the overhead of OpenMP is much lower than for MPI, it is practical to use OpenMP to parallelize inner loops which would be impractical to parallelize with MPI. On the other hand, Amdahl's Law says that you get more speedup if the fraction of your code that is parallelized is higher, so it is best to parallelize the outermost loop whose iterations are independent of one another and whose calculations can be done on a single node. If the best loop for parallelization is so big that more than one node is needed, then you can probably parallelize it using hybrid programming.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement