Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Aggregating data with Cascalog


So far, the Cascalog queries you saw have all returned tables of results. However, sometimes you'll want to aggregate the tables in order to boil them down to a single value or into a table where groups from the original data are aggregated.

Cascalog also makes this easy to do, and it includes a number of aggregate functions. For this recipe, we'll only use two—cascalog.logic.opts/distinct-count and cascalog.logic.ops/sumsum—but you can find more easily in the API documentation on the Cascalog website (http://nathanmarz.github.io/cascalog/cascalog.logic.ops.html).

Getting ready

We'll use the same dependencies and imports as we did in Parsing CSV Files with Cascalog. We'll also use the same data that we defined in that recipe.

How to do it…

We'll take a look at a couple of examples on how to aggregate data with the count function:

  1. First, we'll query how many:

    user=> (?<- (stdout)
         [?count]
         ((hfs-text-delim "data/16285/flights_with_colnames.csv"
        ...