So far we've run everything in the single-server mode. This is good for experimenting and for working out the bugs in our process, but it's not very good for actually analyzing data. So for this recipe, we'll use Pallet (http://palletops.com/) to provision a cluster of EC2 instances to run one of the previous recipes.
Pallet is a platform for provisioning cloud systems. We define the hardware, OS, and software for nodes and groups of nodes. We connect those to a service provider, such as EC2, and use the Clojure REPL to spin up those nodes and bring them back down. One of the nice things about Pallet is that, unlike many other provisioning systems, nothing has to be installed on the systems being created. Everything is done via SSH.
Sam Ritchie (https://github.com/sritchie) and the Pallet team have put together a library and a sample project that shows how to use Pallet to provision a Hadoop cluster on EC2. For this recipe, we'll follow...