Book Image

Mastering Clojure Data Analysis

By : Eric Richard Rochester
Book Image

Mastering Clojure Data Analysis

By: Eric Richard Rochester

Overview of this book

Table of Contents (17 chapters)
Mastering Clojure Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Running the experiment


Remember, earlier we defined functions to break a sequence of tokens into features of various sorts: unigrams, bigrams, trigrams, and POS-tagged unigrams. We can take these and automatically test both the classifiers against all of these types of features. Let's see how.

First, we'll define some top-level variables that associate label keywords with the functions that we want to test at that point in the process (that is, classifiers or feature-generators):

(def classifiers
  {:naive-bayes a/k-fold-naive-bayes
:maxent a/k-fold-logistic})
(def feature-factories
  {:unigram t/unigrams
:bigram t/bigrams
:trigram t/trigrams
:pos (let [pos-model 
              (t/read-me-tagger "data/en-pos-maxent.bin")]
          (fn [ts] (t/with-pos pos-model ts)))})

We can now iterate over both of these hash maps and cross-validate these classifiers on these features. We'll average the error information (the precision and recall) for all of them and return the averages. Once we've executed...