Hadoop is at the heart of the Big Data movement. Being derived from Google's white papers on MapReduce and Google File System, Hadoop is able to scale up beyond petabytes of data and provide the backbone for fast and effective data analysis.
Pentaho was one of the first companies to provide support for Hadoop and has open sourced those capabilities, along with steps for other Big Data sources.
Note
There are a lot of great tutorials and videos on Pentaho's Big Data wiki available at http://wiki.pentaho.com/display/BAD/Pentaho+Big+Data+Community+Home.
Before we actually try to connect to Hadoop, we have to set up an appropriate environment. Companies like Hortonworks and Cloudera have been at the forefront of providing new features and functionality to the Hadoop ecosystem, including Sandbox environments, to learn about the various tools. We will be using Hortonworks' Sandbox environment for this chapter's Hadoop recipes.