Another data format that's becoming increasingly popular is JavaScript Object Notation (JSON, http://json.org/). Like CSV, this is a plain-text format, so it's easy for programs to work with. It provides more information about the data than CSV does, but at the cost of being more verbose. It also allows the data to be structured in more complicated ways, such as hierarchies or sequences of hierarchies.
Because JSON is a much fuller data model than CSV, we may need to transform the data. In that case, we can pull out just the information we're interested in and flatten the nested maps before we pass it to Incanter. In this recipe, however, we'll just work with fairly simple data structures.
First, include these dependencies in the Leiningen project.clj
file:
:dependencies [[org.clojure/clojure "1.4.0"] [incanter/incanter-core "1.4.1"] [org.clojure/data.json "0.2.1"]]
Use these libraries in our REPL interpreter or in our program:
(use 'incanter.core 'clojure.data.json)
And have some data. For this, I have a file named data/small-sample.json
that looks like the following:
[{"given_name": "Gomez", "surname": "Addams", "relation": "father"}, {"given_name": "Morticia", "surname": "Addams", "relation": "mother"}, … ]
You can download this data file from http://www.ericrochester.com/clj-data-analysis/data/small-sample.json.
Once everything's in place, this is just a one-liner, which we can execute at the REPL interpreter:
user=> (to-dataset (read-json (slurp "data/small-sample.json"))) [:given_name :surname :relation] ["Gomez" "Addams" "father"] ["Morticia" "Addams" "mother"] ["Pugsley" "Addams" "brother"] …
Like all Lisps, Clojure is usually read from inside out, from right to left. Let's break it down. clojure.core/slurp
reads in the contents of the file and returns it as a string. This is obviously a bad idea for very large files, but for small ones it's handy. clojure.data.json/read-json
takes the data from slurp
, parses it as JSON, and returns native Clojure data structures. In this case, it returns a vector of maps. maps.incanter.core/to-dataset
takes a sequence of maps and returns an Incanter dataset. This will use the keys in the maps as column names and will convert the data values into a matrix. Actually, to-dataset
can accept many different data structures. Try (doc to-dataset)
in the REPL interpreter or see the Incanter documentation at http://data-sorcery.org/contents/ for more information.