Bayesian classification is a way of updating your estimate of the probability that an item is in a given category, depending on what you already know about that item. In the case of a Naïve Bayesian system, we assume that all features are independent. This algorithm has been useful in a number of interesting areas, for example, spam detection in e-mails, automatic language detection, and document classification.
In this recipe, we'll apply it to the mushroom dataset that we looked at in the Classifying data with decision trees recipe.
First, we'll need to use the dependencies that we specified in the project.clj
file in the Loading CSV and ARFF data into Weka recipe. We'll also need the following import in our script or REPL:
(import [weka.classifiers.bayes NaiveBayes])
For data, we'll use the mushroom dataset that we did in the Classifying data with decision trees recipe. You can download it from http://www.ericrochester.com...