As we have seen, multithreading with OpenMP works best in situations where there are few data dependencies between threads. Obviously, if there are no dependencies at all, then you need only be concerned about scheduling the distribution of your worksharing construct over an appropriate number of threads. But if there are some data dependencies, what do you do? The answer depends on what kind of dependencies they are:

  • For updates to global variables, you can use lock functions or critical or atomic constructs to ensure that the updates are done as transactions.
  • If you have reduction variables, they can be efficiently computed by adding a reduction clause to your work-sharing construct.
  • If you have a loop-carried dependency, that loop is not a good candidate for OpenMP parallelization.

A loop-carried dependency means that each iteration of the loop depends on the result of the previous iteration. In that case, it is best to look for a larger loop in the program in which the problematic loop is nested. This may involve putting a much larger amount of your computation into OpenMP parallel constructs, with the attendant problems of maintaining appropriate data integrity for more variables. But the improved concurrency, and hence speedup, should be worth the effort.

Note that parallelizing an outer loop of an application may become especially tricky if it involves doing a large number of independent calculations that have execution times varying over a wide range — say from milliseconds to hours. However, using the dynamic scheduling options available in OpenMP, the main loop over all of the calculations can be efficiently parallelized, even in this case.

From the above discussion, you can perhaps understand why the various kinds of OpenMP constructs and clauses were invented in the first place!

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement