Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Defining new Cascalog operators


Cascalog comes with a number of operators; however, you'll often need to define your own, as we saw in the Aggregating data in Cascalog recipe.

For different uses, Cascalog defines a number of different categories of operators, each with different properties. Some are run in the map phase of processing, and some are run in the reduce phase. The ones in the map phase can use a number of extra optimizations, so if you can push some of your processing into that stage, you'll get better performance. In this recipe, you'll see which categories of operators are on the map side and which are on the reduce side. We'll also provide an example of each and see how they fit into the larger processing model.

Getting ready

For this recipe, we'll use the same dependencies and inclusions that we did in the Initializing Cascalog and Hadoop for distributed processing recipe. We'll also use the Doctor Who companion data from that recipe.

How to do it…

As I mentioned, Cascalog allows...