Data Dependencies
Earlier, we conceptualized the vectorization of a given loop in terms of loop unrolling. The compiler effectively unrolls the loop by N for a vector length of N, and the unrolled group of N iterations is evaluated in parallel rather than sequentially. However, the original ordering of the loop changes due to this transformation, and a problem arises when an operation in one iteration depends upon the result of a previous iteration. In this case, automatic vectorization might lead to incorrect results.
This situation is known as a data dependency. The compiler needs to prove to itself that no harmful dependencies exist in the loop, so that computing the results N iterations at a time produces exactly the same results as sequential (non-vectorized) execution. When it can't prove the correctness of a vectorized loop, it won't vectorize that loop.
The following list describes several major kinds of data dependencies commonly encountered in scientific code, along with commentary explaining why a particular kind of data dependency is or is not vectorizable.
Read after write ("flow" or "RAW") dependency
This kind of dependency is not vectorizable. It occurs when the values of variables involved in a particular loop iteration (the "read") are determined in a previous loop iteration (the "write"). In other words, a variable is read (used as an operand to a mathematical operation) after its value has been modified by a previous loop iteration.
For example, consider the following C code:
Applying each operation in the loop sequentially:
a[1] = a[0] + b[1] → a[1] = 0 + 6 → a[1] = 6 a[2] = a[1] + b[2] → a[2] = 6 + 7 → a[2] = 13 a[3] = a[2] + b[3] → a[3] = 13 + 8 → a[3] = 21 a[4] = a[3] + b[4] → a[4] = 21 + 9 → a[4] = 30 a = {0, 6, 13, 21, 30}
In this example, the first loop iteration performs an operation and writes a value to a[1]
. The second iteration then reads the value of
a[1]
as input to an operation. The red numbers indicate all the values that are written by an iteration, then later read by the next. Now, let's suppose
instead that all four loop iterations are resolved simultaneously via vector hardware:
a[i-1] = {0,1,2,3} (load) b[i] = {6,7,8,9} (load) {0,1,2,3} + {6,7,8,9} = {6,8,10,12} (operate) a[i] = {6,8,10,12} (store) a = {0, 6, 8, 10, 12} ≠ {0, 6, 13, 21, 30} NOT VECTORIZABLE
As is evident, the value of a
resulting from sequential computation, {0,6,13,21,30}
is different from the value of a
computed
using vectors, {0,6,8,10,12}
. This demonstrates why read after write dependencies are not vectorizable. Compilers will refuse to vectorize loops with
this kind of dependency, as computation on vector hardware would result in incorrect results.
Write after read ("anti" or "WAR") dependency
This kind of dependency is vectorizable. This dependency is the opposite of read after write: a value is used as the input to an operation in an earlier iteration, then written to in a later iteration. Let's modify our previous loop slightly to illustrate this kind of dependency:
Again applying each operation in the loop sequentially:
a[1] = a[0] + b[1] → a[1] = 1 + 5 → a[1] = 6 a[2] = a[1] + b[2] → a[2] = 2 + 6 → a[2] = 8 a[3] = a[2] + b[3] → a[3] = 3 + 7 → a[3] = 10 a[4] = a[3] + b[4] → a[4] = 4 + 8 → a[4] = 12 a = {6, 8, 10, 12, 4}
Notice that there are no red numbers. Although a[1]
is involved in both the first and second iterations, its value is only read in the
first iteration, then written in the second. Executing the same loop using vectors:
a[i+1] = {1,2,3,4} (load) b[i] = {5,6,7,8} (load) {1,2,3,4} + {5,6,7,8} = {6,8,10,12} (operate) a[i] = {6,8,10,12} (store) a = {6, 8, 10, 12, 4} VECTORIZABLE
The value of a
is the same, {6,8,10,12,4}
, regardless of sequential or vector execution. This illustrates that while
write-after-read is a data dependency, it is not one that produces incorrect results when vectorized.
Write after write ("output") dependency
This occurs when multiple loop iterations can alter the value of a single variable. In other words, a value is written by one loop iteration, then written again by another. In a vector operation, if an identical address exists for any two (or more) store operations, the result to be stored is indeterminate. For this reason, write after write is non-vectorizable. Below is an example of a loop with such a dependency:
Unroll this loop yourself, solve it sequentially and using vectors, and see what happens.
Read after read (non-)dependency
Finally, let's look at the remaining read/write combination to see if there is any dependency that prevents vectorization. In the following example there is a "read after read" within the loop:
From glancing at the indices of the arrays, it appears that b
might have a dependency between iterations. If you think about it, though, nothing about
b
changes from iteration to iteration; it is only involved in reads, never in writes. So there is not really a dependency here. The loop could be vectorized,
in principle. However, the compiler might find it tricky to load b[i%2]
using vector instructions.