Hadoop is an open-source framework for storing and processing large data sets on commercial hardware. Together with the languages and tools (like Apache Pig) written on top of it, it provides the dominant platform for working with data at scale.
Since we don’t all have a bunch of old Linux machines in our basements to wire together, Amazon provides on-demand hardware for Hadoop processing in the form of Elastic MapReduce. These machine clusters are provisioned when needed, charged at an hourly rate, and can be shut down at any time. This allows for always running the size cluster you need at the time you need it.