Generally, data won't be quite in the form we'll need for our analyses. We spent a lot of time transforming data in Clojure in Chapter 2, Cleaning and Validating Data. Weka contains several methods for renaming columns and filtering the ones that will make it into the dataset.
Most datasets have one or more columns that will throw off clustering—row identifiers or name fields, for instance—so we must filter the columns in the datasets before we perform any analysis. We'll see lot of examples of this in the recipes to come.
We'll use the dependencies, imports, and datafiles that we did in the Loading CSV and ARFF files into Weka recipe. We'll also use the dataset that we loaded in that recipe. We'll need to access a different set of Weka classes, as well as the clojure.string
library:
(import [weka.filters Filter] [weka.filters.unsupervised.attribute Remove]) (require '[clojure.string :as str])