Book Image

Natural Language Processing with Java

By : Richard M. Reese , Richard M Reese
Book Image

Natural Language Processing with Java

By: Richard M. Reese , Richard M Reese

Overview of this book

Table of Contents (15 chapters)
Natural Language Processing with Java
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 6. Classifying Texts and Documents

In this chapter, we will demonstrate how to use various NLP APIs to perform text classification. This is not to be confused with text clustering. Clustering is concerned with the identification of text without the use of predefined categories. Classification, in contrast, uses predefined categories. We will focus on text classification where tags are assigned to text to specify its type.

The general approach used to perform text classification starts with the training of a model. The model is validated and then used to classify documents. We will focus on the training and usage steps.

Documents can be classified according to any number of attributes such as its subject, document type, time of publication, author, language used, and reading level. Some classification approaches require humans to label sample data.

Sentiment analysis is a type of classification. It is concerned with determining what text is trying to convey to a reader, usually in the...