-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Clojure Data Analysis Cookbook - Second Edition - Second Edition
By :
Hadoop was developed by Yahoo! to implement Google's MapReduce algorithm, and then it was open sourced. Since then, it's become one of the most widely tested and used systems for creating distributed processing.
The central part of this ecosystem is Hadoop, but it's also complemented by a range of other tools, including the Hadoop Distributed File System (HDFS) and Pig, a language used to write jobs in order to run them on Hadoop.
One tool that makes working with Hadoop easier is Cascading. This provides a workflow-like layer on top of Hadoop that can make the expression of some data processing and analysis tasks much easier. Cascalog is a Clojure-idiomatic interface to Cascading and, ultimately, Hadoop.
This recipe will show you how to access and query data in Clojure sequences using Cascalog.
First, we have to list our dependencies in the Leiningen project.clj file:
(defproject distrib-data "0.1.0...
Change the font size
Change margin width
Change background colour