Skip to main content


Stampede's most important feature is that it has over a hundred thousand cores for concurrent computation. There are 6400 "regular" nodes on Stampede; each node has two Xeon E5-2680 processors with eight cores/processor. In addition each of these nodes has an Intel Xeon Phi SE10P Coprocessor with 61 cores and 8 GB of memory. There are also 16 large memory nodes and 128 GPU nodes. This module is about how to design programs that can effectively use many cores.

Most people develop parallel versions of programs to solve their problems because their serial programs take too long to finish. This is not the only reason for doing parallel processing, though. Some problems require more working memory than is available in a single machine, so they are run on a distributed cluster of processors like Stampede to take advantage of the large amount of memory that comes with many processors.

Parallel programs are expensive, both in terms of the programmer's time and effort, and in terms of machine resources used. Nevertheless, high performance computing (HPC) in parallel on a machine like Stampede enables a person to complete computations that would not finish within their lifetime if done serially.

There are many considerations in designing, writing, and optimizing parallel programs that are common to most of them. Without getting into the specifics of particular programs, we will introduce those concepts in this module.

Cornell Center for Advanced Computing

Revised and updated by Adam Brazier
September 2014