Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Grouping data with $group-by


Datasets often come with an inherent structure. Two or more rows might have the same value in one column, and we might want to leverage that by grouping those rows together in our analysis.

Getting ready

First, we'll need to declare a dependency on Incanter in the project.clj file:

(defproject inc-dsets "0.1.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]
                 [org.clojure/data.csv "0.1.2"]])

Next, we'll include Incanter core and io in our script or REPL:

(require '[incanter.core :as i]
         '[incanter.io :as i-io])

For data, we'll use the census race data for all the states. You can download it from http://www.ericrochester.com/clj-data-analysis/data/all_160.P3.csv.

These lines will load the data into the race-data name:

(def data-file "data/all_160.P3.csv")
(def race-data (i-io/read-dataset data-file :header true))

How to do it…

Incanter lets you group rows for further analysis or to summarize them with the $group...