Book Image

Apache Mahout Essentials

By : Jayani Withanawasam
Book Image

Apache Mahout Essentials

By: Jayani Withanawasam

Overview of this book

Table of Contents (13 chapters)
Apache Mahout Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The Naïve Bayes algorithm


The Naïve Bayes is a probabilistic classifier based on Bayes' theorem. This assumes strong (naive) independence assumptions between the features.

As long as features are not correlated and not repetitive, both Naïve Bayes and logistic regression will perform in a similar manner. However, when features are correlated and repetitive, the Naïve Bayes algorithm behaves differently due to its conditional independence assumption.

The Bayes theorem

This is the mathematical equation for the Bayes theorem:

Bayes theorem

Here, A and B are events:

  • P(A) and P(B) are the probabilities of A and B, independent of each other

  • P(A|B), a conditional probability, is the probability of A given that B is true

  • P(B|A), is the probability of B given that A is true

Text classification

Text classification is the task of classifying documents by their content (by the words that they contain). The best-known current text classification problem is e-mail spam filtering.

Note

Did you know?

Spam filtering...