As we have seen, compilers must prove during vectorization that there are no data dependencies that will affect the correctness of the result. This task turns out to be surprisingly difficult. For instance, inside a function, the compiler might be uncertain about the content and origins of all the input data to a loop. There are situations where the programmer who is working with the code can be more aware of context than the compiler. For this reason, most compilers allow the programmer to supply hints (or guarantees) that influence the vectorization process.

Be aware that compilers are getting better and better at identifying where vectorization is possible and beneficial. For the simplest loops, providing guidance to the compiler is generally unnecessary and may actually be unwise. Consider the following code:

In this case, the Intel compiler is able to recognize that certain negative values of M will result in a read-after-write dependency. But the compiler is not entirely prevented from vectorizing the loop; instead, it produces "multiversioned" code that can branch to either a vectorized or an unvectorized loop, depending on the value of M! In many ways this outcome is better than if we had forced the compiler to vectorize the loop through the compiler directives presented below. Why take the risk of generating incorrect results through the heavy-handed use of directives?

Nevertheless, there are other instances where the compiler cannot so easily distinguish between safe and unsafe cases. It might have to decline to vectorize a loop if it has a bit more complexity than the one above. In such instances, compiler hints can be one way to ensure vector speedup. Usually these hints or directives to the compiler appear as special comments in the code; in C/C++, such directives are called pragmas.

ivdep

The Intel and GCC compilers respect ivdep (Ignore Vector DEPendencies) annotations in the source code. This annotation implies (or guarantees) that any apparent data dependencies are safe to ignore. The pragma for GCC is #pragma GCC ivdep; for Intel, just leave out the GCC. The Intel syntax is as follows:

Note that for Intel Fortran, the analogous code annotation is likewise a special comment, !DEC$ IVDEP.

Either way, the ivdep annotation tells the compiler to assume there are no vector dependencies in a given block of code. If a certain loop fails to vectorize due to a suspected dependency, then adding this annotation will allow the compiler to set aside its suspicions and vectorize the loop.

This directive should be used only in circumstances where the programmer is absolutely confident there truly is no dependency. If the programmer is wrong, and the loop actually does have a "bad" dependency like read after write, the program will likely produce incorrect results due to the effect we explored earlier. Let's look more closely at the full version of the C function above:

There is a harmless write-after-read dependency when M ≥ 0, but a read-after-write problem could arise if M is negative. However, if we happened to know that the function is never called with M < 0, then we could confidently insert the #pragma ivdep as shown, which allows the compiler to vectorize the loop. (Interestingly, the vectorized code also works correctly when M ≤ -W where W is the vector width, as this pushes the dependencies far enough apart. Thus, the excluded values are actually -W < M < 0.)

Tip: Vectorization is not forced by this annotation

Note that this annotation does not require the compiler to vectorize the loop. The Intel compiler generally weighs the benefits of vectorized execution against the costs from arranging the data into vectors (overhead, latency, etc.). If the costs are too high, the compiler will not vectorize the loop. Such decisions are noted in the optimization report.

vector always, novector, simd

The Intel compiler provides other annotations to promote vectorization, though not specifically related to data dependencies. For instance, #pragma vector always instructs the compiler to vectorize a loop if it is technically possible to do so. Essentially, this overrides any cost metrics the Intel compiler may use to determine if a vectorized version of a loop will be faster than a non-vectorized version. The annotation is "safe" in that the compiler will never produce incorrect results with its use.

Another potentially useful Intel-only annotation is #pragma novector. This is helpful during testing or debugging, in order to investigate the effect of vectorizing a specific loop. A loop with the novector annotation will not be vectorized.

The #pragma simd directive is even more insistent than vector always. A compiler warning is issued if the compiler determines that the annotated loop cannot be vectorized safely. Many clauses for this pragma are available to fine-tune its behavior.

Intel's simd directive has largely been rolled into OpenMP 4.0, allowing greater portability. The OpenMP syntax is #pragma omp simd, and it works with both Intel and GCC. However, an additional option must be provided to the compiler: for Intel, this option is -qopenmp, or at a minimum, -qopenmp-simd; for GCC, the corresponding options are -fopenmp and -fopenmp-simd.

Further discussion of how to use OpenMP directives as an aid to vectorization is beyond our present scope. However, it is worth mentioning that GCC in particular may fail to vectorize certain loops unless this pragma and some of its OpenMP clauses are present. For example, GCC needs #pragma omp for simd in order to vectorize the multithreaded OpenMP code below, while Intel doesn't:

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement