A programmer seeking to adapt serial code to run on multiple cores can employ two different approaches. One approach splits the original computation over the additional cores to reduce the program run time. The second approach uses the additional cores to solve a scaled-up version of the original problem. This section presents two theoretical results that describe the conditions under which each approach will be successful. Amdahl's Law implies a limit on how much faster you can solve the original problem by using additional cores. In contrast, Gustafson's Law shows that scaling up the problem is primarily limited by the number of available cores (\(N\)). The unparallelizable portion of the program limits performance in both cases, but it is more detrimental when the problem size cannot scale with the number of cores.

Amdahl's Law

Amdahl's Law shows that a program's serial parts limit the potential speedup from parallelizing code. Amdahl conceptualizes programs as consisting of two parts: the completely serial sections (\(s\)) and the completely parallelizable sections (\(p\)). When the program executes on a single core, its execution time (\(T_1\)) will be the sum of the time spent executing the serial fraction (\(F_s\)) and the time spent executing the parallel fraction (\(F_p\)), so \(F_s + F_p = T_1\).

When the program executes on multiple cores, the computations in the parallelizable code sections are divided among the available cores. The serial sections cannot use the additional cores; therefore, the expected execution time for running a program on \(N\) cores is: \(T_N = F_s + F_P / N\). As the number of cores (\(N\)) increases, the total execution time approaches that of the serial portion of the code.

To simplify later expressions, we can define the single core execution time as one unit of time, so \(F_s + F_p = 1\) Expressed in terms of speedup, \(S\), Amdahl's Law says that:

\[ \begin{align} S & = {\text{serial run time} \over \text{parallel run time}} \\[1ex] & = {T_1 \over T_N} \\[1ex] & = {F_s + F_p \over F_s + F_p / N} \\[1ex] & = {1 \over F_s + F_p / N} \end{align} \]

As the number of cores increases, the serial portion dominates the execution time:

\[ \begin{align} \lim_{N \to \infty} S & = {F_s + F_p \over F_s} \\[1ex] & = {1 \over F_s} \end{align} \]

As long as \(F_p / N\) is large compared to \(F_s\), speedup is proportional to the number of cores. Therefore, it is only when \(F_s\) is much smaller than \(F_p\) that it is appropriate to use more cores (\(N\)) for a problem. As the number of cores increases, the execution time will approach \(F_s\). Adding more cores after this point will not yield any additional speedup.

Gustafson's Law

Gustafson (1988) observed that scientists and programmers tend to scale up their research ambitions and programs to match the available computing power. Rather than perform the same analyses in less time, researchers who gained access to additional cores tended to do more computation in about the same time. In other words, \(F_p\) tends to scale with \(N\). While Amdahl's Law was derived with the assumption of fixed problem size, Gustafson argued that deriving a measure based on the assumption of fixed run time would better express the practical speedup offered by parallel computation.

Under the assumptions of Gustafson's Law, the parallel portions of the program scale with the number of cores available.
Under the assumptions of Gustafson's Law, the parallel portion of a program increases as the number of cores increases. Instead of focusing on reducing the execution time, we increase the amount of computation within a given time interval. Starting with a parallel program running on multiple cores, we can consider how long this program would take to run on a single core.

Suppose we scale up the workload in the parallel part of the computation to match the number of cores. In that case, the total core time spent in the parallel portion of the program is N⋅Fp. Assuming the core time for the serial portion stays constant as the core time spent in the parallel portion scales up, then the scaled speedup is

\[ \begin{align} \text{Scaled speedup} & = {\text{serial run time} \over \text{parallel run time}} \\[1ex] & = {F_s + N \cdot F_p \over F_s + N \cdot F_p / N} \end{align} \]

Defining the time unit so \(F_s + F_p = 1\), as we did with Amdahl's Law above, allows several insightful expressions of Gustafson's Law:

\[ \begin{align} \text{Scaled speedup} & = N \cdot F_p + F_s \\[1ex] & = N + (1-N) \cdot F_s \\[1ex] & = (N-1) \cdot F_p + 1 \end{align} \]

Evidently, the scaled speedup stays linear with the number of cores (\(N\)) as \(N\) grows, with a slope equal to the proportion of parallel computation. This is in contrast to Amdahl's Law, where the speedup is bounded by a constant, \(1/F_s\), which is determined by the proportion of serial computation.

Even though Amdahl's Law identifies a limit on the possible speedup for a specific program, researchers can take advantage of available cores by scaling up parallel programs to explore their questions in higher resolution or at a larger scale. With increases in computational power, researchers can be increasingly ambitious about the scale and complexity of their programs. Gustafson's Law captures the empirical observation that the amount of time programs spend in parallel computation tends to scale with the number of cores available.

If you are parallelizing a serial program, consider how you might scale up your research problem to utilize the available cores. You might find that higher resolution or larger scale results are more valuable than reduced run times.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement