Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Document classification using Mahout Naive Bayes Classifier


Classification assigns documents or data items to an already known set of classes with already known properties. Document classification or categorization is used when we need to assign documents to one or more categories. This is a frequent use case in information retrieval as well as library science.

The Classification using the naïve Bayes classifier recipe in Chapter 9, Classifications, Recommendations, and Finding Relationships provides a more detailed description about classification use cases, and also gives you an overview of using the Naive Bayes classifier algorithm. This recipe focuses on highlighting the classification support in Apache Mahout for text documents.

Getting ready

  • Install Apache Mahout in your machine using your Hadoop distribution, or install the latest Apache Mahout version manually.

How to do it...

The following steps use the Apache Mahout Naive Bayes algorithm to cluster the 20news dataset:

  1. Refer to the Creating...