Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Performing naïve Bayesian classification with MALLET


MALLET has gotten its reputation as a library for topic modeling. However, it also has a lot of other algorithms in it.

One popular algorithm that MALLET implements is naïve Bayesian classification. If you have documents that are already divided into categories, you can train a classifier to categorize new documents into those same categories. Often, this works surprisingly well.

One common use for this is in spam e-mail detection. We'll use this as our example here too.

Getting ready

We'll need to have MALLET included in our project.clj file:

(defproject com.ericrochester/text-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [cc.mallet/mallet "2.0.7"]])

Just as in the Performing topic modeling with MALLET recipe, the list of classes to be included is a little long, but most of them are for the processing pipeline, as shown here:

(require '[clojure.java.io :as io])
(import [cc.mallet.util.*]
        [cc.mallet...