Pointer aliasing is a hidden kind of data dependency that can occur in C, C++, or any other language that uses pointers for array addresses in arithmetic operations. Array data identified by pointers in C can overlap, because the C language puts very few restrictions on pointers. Two seemingly different pointers may point to storage locations in the same array (aliasing). As a result, data dependencies can arise when performing loop-based computations using pointers, as the pointers may potentially point to overlapping regions in memory.

However, for vectorization to take place, the compiler needs to prove to itself that no data dependencies (overlaps) are possible. This is difficult to do when using pointers, which means that the compiler is often forced to be conservative about vectorization. (Fortran 90/95 is less vulnerable to this issue because a pointer must be explicitly associated to a target variable of known size before it can be used.)

As an example, consider the following C function. Is it safe to vectorize the for loop?

At first glance, this appears to be a straightforward, easily vectorizable loop with no apparent data dependencies. However, what if we invoked it as follows:

This is valid in C. In our compute() function, pointers a and b both refer to overlapping regions of p. In fact, b is just p offset by one array index. The ith element of a corresponds to the ith element of p. The ith element of b corresponds to (and serves as an alias for) the (i-1)th element of p. As a result, we could substitute these values and construct an equivalent loop:

It is now evident that the loop has a read after write dependency. As we have seen before, this means the loop is not vectorizable.

Modern compilers are often clever for simple loops like the one above. They may query the starting address and ending address (e.g., a[1] and a[N-1], and b[1] and b[N-1], etc.) and vectorize the loop if there is no overlap. The Intel compiler can even produce "multiversioned" code containing both vectorized and unvectorized versions of the loop, enabling the correct variant to be chosen at run time based on input values! However, if there are too many manipulations in the indexing (e.g., a[i+j]), compilers will not try to take the analysis further, and will not vectorize the loop.

restrict (C99 keyword)

Pointers clearly make the compiler's job of deducing the presence of potential dependencies much harder due to the aliasing effect. To address the problems associated with aliasing, the C99 standard introduced a new restrict modifier to variables that implies that they have no overlap (alias) with any other variables having that annotation. Thus, marking up the code with the restrict keyword assures the compiler that pointer arguments never refer to overlapping memory regions. Here is an example of its use:

By default, this keyword is ignored by the Intel compiler unless the -restrict flag is used when compiling. As restrict is part of the C99 standard, it may also be necessary to compile in C99 mode with the flag -std=c99 for some compilers, such as gcc.

 
©  |   Cornell University    |   Center for Advanced Computing    |   Copyright Statement    |   Inclusivity Statement