Cornell Virtual Workshop > Parallel Programming Concepts and High Performance Computing > Efficiency

Scaling

High performance computing is concerned with computing at scale. Scaling involves increasing the size of the problem to be solved or the number of parallel tasks involved in solving the problem. The scalability of a program is how the time-to-results changes with the problem size and the number of workers. In reality, this is a complex relationship that depends on the characteristics of the problem, the choice of algorithms, and the specifics of the underlying hardware and network.

In HPC, there are two main ways of scaling a program. The first approach is strong scaling, where the problem size is constant and the number of tasks assigned to solve the problem increases. The second scaling approach is weak scaling, where the problem size increases along with the number of tasks, so the computation per task remains constant.

Strong Scaling

A slice of the scaling surface where the problem size is fixed. This two dimensional plot shows how the surface in the example compares to the ideal case -- in this case adding new tasks does not reduce run time substantially, so the program does not have good strong scaling performance. — Ideally, time-to-result would decrease linearly on a log-log plot as the number of tasks increases. If the parallelization was perfect, the slope would be -1. In this example, the program does not have good strong scaling performance, but we haven't tried it with enough tasks to find the point where adding new tasks results in no additional speedup. We should keep experimenting by adding more tasks and try to understand why adding tasks is not significantly decreasing the time to solution (e.g., memory contention). A code profiler might aid in this investigation.

Strong scaling performance is affected by the ratio of time spent communicating to the time spent computing. The larger that ratio, the worse the strong scaling behavior will be. If the program has good strong scaling performance, the wall clock time will decrease rapidly as you add more tasks (and typically nodes) to your parallel job. This will improve your turnaround dramatically. On the other hand, if your code doesn't get faster under strong scaling, you may already be using too many tasks for the amount of work you are doing. You should consider using fewer tasks to get better processor utilization and more computing done within your allocation.

Weak Scaling

If you run your parallel program on a relatively small example and have plans to run it on larger problems, you will be interested in its weak scaling performance. Well-written parallel programs will usually have good weak scaling performance. One of the major factors affecting the weak scaling performance is synchronization overhead. If your program has good weak scaling performance, you can run a problem that is twice as large on twice as many nodes in the same wallclock time.

Is it possible to achieve ideal scaling by eliminating all forms of parallel overhead due to communication or synchronization? Not necessarily, because the limits to scaling would then be set by any serial sections of code. In particular, strong scaling, while weak scaling.

Back