Clusters
A cluster is a collection of machines (each of which is called a node) that function in some way as a single resource. They may be administered as a unit, provide a uniform environment for tasks running on the cluster, or work together to provide fault-tolerant access to file storage.
On a compute cluster like Stampede2, the software installed on each node is usually identical, and access from each cluster node to external resources (login node or fileserver) is approximately uniform.
Nodes of a cluster are normally assigned to users by a scheduler. An assignment of a set of nodes for exclusive use by a user for a certain amount of time is called a job.
Clusters have one or more interconnection networks that can be used for communication between nodes. Some networks may be for administrative use only, but at least one interconnect is available for communication between the tasks that are part of the user's job. Each cluster will have a particular interconnection topology designed to support specific types of computation. Common typologies include grid, torus, and fat tree networks.