Skip to main content


The Xeon Phi coprocessor is a system on a PCIe card designed to provide high levels of floating point performance for highly parallel HPC code. Its architecture, known as Many Integrated Core (MIC), features a CPU containing large numbers of simplified x64 cores with wide vector units optimized for aggregate floating point throughput at the expense of single-thread performance.

The MIC architecture is code-compatible, but not binary compatible, with existing code that can run on a traditional multi-core CPU. As a result, it supports many traditional HPC programming paradigms and tools such as MPI and OpenMP. Code does not need to be specifically written for the MIC, nor altered to run on the MIC. Usually existing code can simply be re-compiled for the MIC architecture without modification and be expected to run. For code to run well on the MIC, however, it must be highly parallel and floating point intensive.

The Stampede supercomputer is composed of over 6400 compute nodes, and offers nearly 10 Petaflops (PF) of aggregate floating point throughput. Roughly 2 PF are provided via traditional multi-core CPUs in the form of dual Intel Xeon E5 processors present on each node. The remaining 8 PF, representing a large majority of Stampede's overall floating point throughput, are provided by Xeon Phi coprocessors installed within the compute nodes.

This module describes the MIC architecture behind the Xeon Phi, its performance characteristics, and how and when to run code on Stampede's coprocessors in order to take best advantage of the available resources.

Aaron Birkland
Cornell Center for Advanced Computing

With contributions from:
Texas Advanced Computing Center
Intel Corporation

June 2013