Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
About the Author
About the Reviewer

Parse the data

Since the data will all fit in the main memory for convenience, we'll define several functions that will load the ratings into Clojure data structures. The line->rating function takes a line, splits it into fields where a tab character is found, converts each field to a long datatype, then uses zipmap to convert the sequence into a map with the supplied keys:

(defn to-long [s]
  (Long/parseLong s))

(defn line->rating [line]
  (->> (s/split line #"\t")
       (map to-long)
       (zipmap [:user :item :rating])))

(defn load-ratings [file]
  (with-open [rdr (io/reader (io/resource file))]
    (->> (line-seq rdr)
         (map line->rating)
         (into []))))

(defn ex-7-3 []
  (->> (load-ratings "ua.base")

;; {:rating 5, :item 1, :user 1}

Let's write a function to parse the u.items file as well, so that we know what the movie names are:

(defn line->item-tuple [line]
  (let [[id name] (s/split line #"\|")]
    (vector (to-long...