Datasets often come with inherent structure. Two or more rows may have the same value in one column, and we may want to leverage that by grouping those rows together in our analysis.
First, we'll need to declare a dependency on Incanter in the project.clj
file:
:dependencies [[org.clojure/clojure "1.4.0"] [incanter "1.4.1"]]
Next, we'll include Incanter core
and io
in our script or REPL.
(use '(incanter core io))
For data, we'll use the census race data for all states. We first saw this in the Selecting columns with $ recipe, and we can download it from http://www.ericrochester.com/clj-data-analysis/data/all_160.P3.csv.
(def data-file "data/all_160.P3.csv") (def race-data (read-dataset data-file :header true))