Datasets often come with an inherent structure. Two or more rows might have the same value in one column, and we might want to leverage that by grouping those rows together in our analysis.
First, we'll need to declare a dependency on Incanter in the project.clj
file:
(defproject inc-dsets "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"] [org.clojure/data.csv "0.1.2"]])
Next, we'll include Incanter core
and io
in our script or REPL:
(require '[incanter.core :as i] '[incanter.io :as i-io])
For data, we'll use the census race data for all the states. You can download it from http://www.ericrochester.com/clj-data-analysis/data/all_160.P3.csv.
These lines will load the data into the race-data
name:
(def data-file "data/all_160.P3.csv") (def race-data (i-io/read-dataset data-file :header true))