Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Loading CSV and ARFF files into Weka


Weka is most comfortable when using its own file format: the Attribute-Relation File Format (ARFF). This format includes the types of data in the columns and other information that allow it to be loaded incrementally, and both of these can be important features. Because of this, Weka can load data more reliably. However, Weka can still import CSV files, and when it does, it attempts to guess the type of data in the columns.

In this recipe, we'll see what's necessary to load data from a CSV file and an ARFF file.

Getting ready

First, we'll need to add Weka to the dependencies in our Leiningen project.clj file:

(defproject d-mining "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [nz.ac.waikato.cms.weka/weka-dev "3.7.11"]])

Then we'll import the right classes into our script or REPL:

(import [weka.core.converters ArffLoader CSVLoader]
        [java.io File])

Finally, we'll need to have a CSV file to import. In this recipe, I'll...