Cornell Virtual Workshop > Parallel Programming Concepts and High Performance Computing > High Performance Computing

Clusters

A cluster is a collection of machines (each of which is called a node) that function in some way as a single resource. They may be administered as a unit, provide a uniform environment for tasks running on the cluster, or work together to provide fault-tolerant access to file storage.

On a compute cluster like Stampede2, the software installed on each node is usually identical, and access from each cluster node to external resources (login node or fileserver) is approximately uniform.

Nodes of a cluster are normally assigned to users by a scheduler. An assignment of a set of nodes for exclusive use by a user for a certain amount of time is called a job.

Clusters have one or more interconnection networks that can be used for communication between nodes. Some networks may be for administrative use only, but at least one interconnect is available for communication between the tasks that are part of the user's job. Each cluster will have a particular interconnection topology designed to support specific types of computation. Common typologies include grid, torus, and fat tree networks.

16 nodes arranged in a 4 by 4 grid. Nodes in grid interior are connected to their four nearest grid neighbors. The nodes on the edges are connected to nodes on the opposite edges, creating a torus topology. — A torus interconnect connects sixteen nodes into a cluster.

Schematic network with a fat tree topology — A network with a fat tree topology, connecting a small number of core switches to a larger number of leaf switches, each of which connects to a group of compute nodes (such as a half-rack, as on the Stampede2 and Frontera systems). Image and caption from CVW's Introduction to Advanced Cluster Architectures.

Back