Hadoop was developed by Yahoo to implement Google's MapReduce algorithm and then open-sourced. Since then, it's become one of the most widely-tested and used systems for creating distributed processing.
The central part of this ecosystem is Hadoop, but it's also complemented by a range of other tools, including the Hadoop Distributed File System (HDFS) and Pig, a language for writing jobs to run on Hadoop.
One tool to make the working with Hadoop easier is Cascading. This provides a workflow-like layer on top of Hadoop that can make expressing some tasks much easier. Cascalog is a Clojure-idiomatic interface for Cascading and, ultimately, for Hadoop.
This recipe will show us how to access and query data in Clojure sequences using Cascalog.