One of the good features of Clojure is that most of its sequence-processing functions are lazy. This allows us to handle very large datasets with very little effort. However, when combined with readings from files and other I/O, there are several things that you need to watch out for.
In this recipe, we'll take a look at several ways to safely and lazily read a CSV file. By default, the clojure.data.csv/read-csv
is lazy, so how do you maintain this feature while closing the file at the right time?
We'll use a project.clj
file that includes a dependency on the Clojure CSV library:
(defproject cleaning-data "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [org.clojure/data.csv "0.1.2"]])
We need to load the libraries that we're going to use into the REPL:
(require '[clojure.data.csv :as csv] '[clojure.java.io :as io])