Skip to main content


MapReduce is a simple programming model for analyzing data in a parallel, distributed fashion. Popular implementations of MapReduce such as Apache Hadoop allow simple user-provided functions to be applied to very large datasets using multiple compute nodes in a fault-tolerant fashion. This module describes the basic MapReduce paradigm, the Hadoop MapReduce framework, and distributed filesystems.

Originally developed June 2012
Last updated October 2014

Aaron Birkland (original author)
Cornell Center for Advanced Computing