Book Image

Natural Language Processing with Java

By : Richard M. Reese , Richard M Reese
Book Image

Natural Language Processing with Java

By: Richard M. Reese , Richard M Reese

Overview of this book

Table of Contents (15 chapters)
Natural Language Processing with Java
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Text classifying techniques


Classification is concerned with taking a specific document and determining if it fits into one of several other document groups. There are two basic techniques for classifying text:

  • Rule-based

  • Supervised Machine Learning

Rule-based classification uses a combination of words and other attributes organized around expert crafted rules. These can be very effective but creating them is a time-consuming process.

Supervised Machine Learning (SML) takes a collection of annotated training documents to create a model. The model is normally called the classifier. There are many different machine learning techniques including Naive Bayes, Support-Vector Machine (SVM), and k-nearest neighbor.

We are not concerned with how these approaches work but the interested reader will find innumerable sources that expand upon these and other techniques.