Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Lazily processing very large data sets


One of the good features of Clojure is that most of its sequence-processing functions are lazy. This allows us to handle very large datasets with very little effort. However, when combined with readings from files and other I/O, there are several things that you need to watch out for.

In this recipe, we'll take a look at several ways to safely and lazily read a CSV file. By default, the clojure.data.csv/read-csv is lazy, so how do you maintain this feature while closing the file at the right time?

Getting ready

We'll use a project.clj file that includes a dependency on the Clojure CSV library:

(defproject cleaning-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [org.clojure/data.csv "0.1.2"]])

We need to load the libraries that we're going to use into the REPL:

(require '[clojure.data.csv :as csv]
         '[clojure.java.io :as io])

How to do it…

We'll try several solutions and consider their strengths and weaknesses:

  1. Let...