Generally, the data won't be quite in the form we'll need for our analyses. Weka contains several methods for renaming columns and filtering which ones will make it into the dataset.
Most datasets have one or more columns that will throw off clustering—row identifiers or name fields, for instance—so we must filter the columns in the datasets before we perform any analysis. We'll see a lot of examples of this in the recipes to come.
We'll use the dependencies, imports, and data files that we did in the Loading CSV and ARFF files into Weka recipe. We'll also use the dataset that we loaded in that recipe.
We'll need to access a different set of Weka classes as well as to the clojure.string
library.
(import [weka.filters Filter] [weka.filters.unsupervised.attribute Remove]) (require '[clojure.string :as str])