Book Image

Natural Language Processing with Java

By : Richard M. Reese , Richard M Reese
Book Image

Natural Language Processing with Java

By: Richard M. Reese , Richard M Reese

Overview of this book

Table of Contents (15 chapters)
Natural Language Processing with Java
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

How classification is used


Classifying text is used for a number of purposes:

  • Spam detection

  • Authorship attribution

  • Sentiment analysis

  • Age and gender identification

  • Determining the subject of a document

  • Language identification

Spamming is an unfortunate reality for most e-mail users. If an e-mail can be classified as spam, then it can be moved to a spam folder. A text message can be analyzed and certain attributes can be used to designate the e-mail as spam. These attributes can include misspellings, lack of an appropriate e-mail address for recipients, and a non-standard URL.

Classification has been used to determine the authorship of documents. This has been performed for historical documents such as for The Federalist Papers and for the book Primary Colors where the authors have been identified.

Sentiment analysis is a technique that determines the attitude of text. Movie reviews have been a popular domain but it can be used for almost any product review. This helps companies better assess how...