Weka is most comfortable using its own file format, ARFF. This format includes the types of data in the columns and other information that allows it to be loaded incrementally. However, Weka can still import CSV files, and when it does, it attempts to guess the type of data in the columns.
In this recipe, we'll see what's necessary to load data from a CSV file and an ARFF file.
First, we'll need to add Weka to the
dependencies in our Leiningen project.clj
file:
:dependencies [[org.clojure/clojure "1.4.0"] [nz.ac.waikato.cms.weka/weka-dev "3.7.7"]]
Then, we'll import the right classes into our script or REPL.
(import [weka.core.converters ArffLoader CSVLoader] [java.io File])
Finally, we'll need to have a CSV file to import. In this recipe, I'll use the dataset of racial census data that we compiled for the Grouping data with $group-by recipe in Chapter 6, Working with Incanter Datasets. It's in the file named data/all_160...