This roadmap explains parallel programming concepts, how parallel programming relates to high performance computing, and how to design programs that can effectively use many cores. It introduces the concepts and considerations necessary for designing, writing, and optimizing parallel programs, without delving into the specifics of particular programs.

Parallel programming is increasingly relevant for all computing platforms; most personal computers (and even cell phones) sold today include multiple processing cores and require parallel programs to yield the best performance. Though parallel programming requires more time and effort than serial programming, parallel computation is the only way to leverage the power of supercomputer installations like Stampede2.

You may be motivated to develop a parallel program because the serial version of your program runs too slowly. Parallelization can spread the computation across the cores of your personal workstation, reducing run times from days to hours — or it can spread your computation across the nodes of a supercomputer, reducing the runtime from years to hours. Stampede2's most important feature is that it has over 350,000 cores for concurrent computation. By leveraging parallel computation on a machine like Stampede2, a person can complete many months' worth of serial computation in days.

As of March 2022, Stampede2 has:

  • 3,752 Intel Xeon Phi 7250 "Knights Landing" (KNL) nodes on Stampede2; each KNL node has 68 cores, 96GB DDR4 RAM, and 16GB high-speed MCDRAM.
  • 1,736 Xeon Platinum 8160 "Skylake" (SKX) larger-memory nodes with 48 cores and 192GB DDR4 RAM.
  • 224 two-socket Intel Xeon Platinum 8380 "Ice Lake" (ICX) nodes with 80 cores and 256GB DDR4 RAM.

Another reason you might parallelize your code is to use more memory than is available on a single machine. Parallel processing allows programs to use the massive amounts of memory available on supercomputing clusters.


After you complete this roadmap, you should be able to:

  • Define parallel programming terminology
  • Explain the ways in which parallelization can enable high performance computing
  • Identify the issues involved in developing a parallel application
  • Decide on an approach for developing a parallel version of your application

This roadmap assumes a basic understanding of serial (single-threaded) programming and familiarity with computer terminology.

You don't need to know how to write code to understand this topic, but you will get more out of this topic if you have a specific computation in mind. If you have coded a serial example of your problem, you can use the techniques described in the CVW Profiling and Debugging topic to gather profiling information. Profiling will reveal the time-consuming parts of your program that should be the target of your parallelization efforts.

System requirements include:
©   Cornell University  |  Center for Advanced Computing  |  Copyright Statement  |  Inclusivity Statement