High performance computing is concerned with computing at scale. Scaling involves increasing the size of the problem to be solved or the number of parallel tasks involved in solving the problem. The scalability of a program is how the time-to-results changes with the problem size and the number of workers. In reality, this is a complex relationship that depends on the characteristics of the problem, the choice of algorithms, and the specifics of the underlying hardware and network.

A depiction of the scaling properties of a program as a surface in three dimensions. The x axis is the number of parallel tasks, the z axis is the size of the input problem and the y axis is the dependent variable: wall compute time.
The relationship between problem size, number of workers, and the time-to-results can be visualized as a three-dimensional surface. The scaling surface may be complex due to the effects of other factors. All axes are logarithmic.

In HPC, there are two main ways of scaling a program. The first approach is strong scaling, where the problem size is constant and the number of tasks assigned to solve the problem increases. The second scaling approach is weak scaling, where the problem size increases along with the number of tasks, so the computation per task remains constant.

Strong Scaling
Strong scaling corresponds to a slice of the scaling surface where the problem size is fixed.
Strong scaling is concerned with a slice of the scaling surface at a particular problem size.
A slice of the scaling surface where the problem size is fixed. This two dimensional plot shows how the surface in the example compares to the ideal case -- in this case adding new tasks does not reduce run time substantially, so the program does not have good strong scaling performance.
Ideally, time-to-result would decrease linearly on a log-log plot as the number of tasks increases. If the parallelization was perfect, the slope would be -1. In this example, the program does not have good strong scaling performance, but we haven't tried it with enough tasks to find the point where adding new tasks results in no additional speedup. We should keep experimenting by adding more tasks and try to understand why adding tasks is not significantly decreasing the time to solution (e.g., memory contention). A code profiler might aid in this investigation.

Strong scaling performance is affected by the ratio of time spent communicating to the time spent computing. The larger that ratio, the worse the strong scaling behavior will be. If the program has good strong scaling performance, the wall clock time will decrease rapidly as you add more tasks (and typically nodes) to your parallel job. This will improve your turnaround dramatically. On the other hand, if your code doesn't get faster under strong scaling, you may already be using too many tasks for the amount of work you are doing. You should consider using fewer tasks to get better processor utilization and more computing done within your allocation.

Weak Scaling
Weak scaling is concerned with a slice of the scaling surface where the work per task is constant.
Weak scaling is concerned with a slice of the scaling surface where the work per task is constant.
This two dimensional projection of the weak scaling slice shows how the surface in the example compares to the ideal case -- in this case scaling the problem size and the number of workers simultaneously does not increase the time-to-solution, so this program has good performance under weak scaling.
Ideally, time-to-result would remain constant if we simultaneously and proportionally scale the problem size and the number of tasks. In this example, weak scaling does not increase the time-to-solution, so this program has good weak scaling performance. This means that a larger problem can be solved in about the same amount of time as a smaller problem as long as the number of tasks increases proportionally.

If you run your parallel program on a relatively small example and have plans to run it on larger problems, you will be interested in its weak scaling performance. Well-written parallel programs will usually have good weak scaling performance. One of the major factors affecting the weak scaling performance is synchronization overhead. If your program has good weak scaling performance, you can run a problem that is twice as large on twice as many nodes in the same wallclock time.

Is it possible to achieve ideal scaling by eliminating all forms of parallel overhead due to communication or synchronization? Not necessarily, because the limits to scaling would then be set by any serial sections of code. In particular, strong scaling, while weak scaling.

 
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement