Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Downloading the data


This chapter makes use of the Reuters-21578 dataset: a venerable collection of articles that were published on the Reuters newswire in 1987. It is one of the most widely used for testing the categorization and classification of text. The copyright for the text of articles and annotations in the Reuters-21578 collection resides with Reuters Ltd. Reuters Ltd. and Carnegie Group, Inc. have agreed to allow the free distribution of this data for research purposes only.

Note

You can download the example code for this chapter from the Packt Publishing's website or from https://github.com/clojuredatascience/ch6-clustering.

As usual, within the sample code is a script to download and unzip the files to the data directory. You can run it from within the project directory with the following command:

script/download-data.sh

Alternatively, at the time of writing, the Reuters dataset can be downloaded from http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.tar.gz. The rest of...