Parse the data
Since the data will all fit in the main memory for convenience, we'll define several functions that will load the ratings into Clojure data structures. The line->rating
function takes a line, splits it into fields where a tab character is found, converts each field to a long
datatype, then uses zipmap
to convert the sequence into a map with the supplied keys:
(defn to-long [s] (Long/parseLong s)) (defn line->rating [line] (->> (s/split line #"\t") (map to-long) (zipmap [:user :item :rating]))) (defn load-ratings [file] (with-open [rdr (io/reader (io/resource file))] (->> (line-seq rdr) (map line->rating) (into [])))) (defn ex-7-3 [] (->> (load-ratings "ua.base") (first))) ;; {:rating 5, :item 1, :user 1}
Let's write a function to parse the u.items
file as well, so that we know what the movie names are:
(defn line->item-tuple [line] (let [[id name] (s/split line #"\|")] (vector (to-long...