Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Filtering, renaming, and deleting columns in Weka datasets


Generally, data won't be quite in the form we'll need for our analyses. We spent a lot of time transforming data in Clojure in Chapter 2, Cleaning and Validating Data. Weka contains several methods for renaming columns and filtering the ones that will make it into the dataset.

Most datasets have one or more columns that will throw off clustering—row identifiers or name fields, for instance—so we must filter the columns in the datasets before we perform any analysis. We'll see lot of examples of this in the recipes to come.

Getting ready

We'll use the dependencies, imports, and datafiles that we did in the Loading CSV and ARFF files into Weka recipe. We'll also use the dataset that we loaded in that recipe. We'll need to access a different set of Weka classes, as well as the clojure.string library:

(import [weka.filters Filter]
        [weka.filters.unsupervised.attribute Remove])
(require '[clojure.string :as str])

How to do it…

In this...