Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Downloading the code and data


This chapter makes use of data on individual income by the zip code provided by the U.S. Internal Revenue Service (IRS). The data contains selected income and tax items classified by state, zip code, and income classes.

It's 100 MB in size and can be downloaded from http://www.irs.gov/pub/irs-soi/12zpallagi.csv to the example code's data directory. Since the file contains the IRS Statistics of Income (SoI), we've renamed the file to soi.csv for the examples.

Note

The example code for this chapter is available from the Packt Publishing's website or https://github.com/clojuredatascience/ch5-big-data.

As usual, a script has been provided to download and rename the data for you. It can be run on the command line from within the project directory with:

script/download-data.sh

If you run this, the file will be downloaded and renamed automatically.

Inspecting the data

Once you've downloaded the data, take a look at the column headings in the first line of the file. One way...