A case-restoring spell corrector, also called a truecasing corrector, only restores the case and does not change anything else, that is, it does not correct spelling errors. This is very useful when dealing with low-quality text from transcriptions, automatic speech-recognition output, chat logs, and so on, which contain a variety of case challenges. We typically want to enhance this text to build better rule-based or machine-learning systems. For example, news and video transcriptions (such as closed captions) typically have errors, and this makes it harder to use this data to train NER. Case restoration can be used as a normalization tool across different data sources to ensure that all the data is consistent.
Natural Language Processing with Java and LingPipe Cookbook
Natural Language Processing with Java and LingPipe Cookbook
Overview of this book
Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Simple Classifiers
Finding and Working with Words
Advanced Classifiers
Tagging Words and Tokens
Finding Spans in Text – Chunking
String Comparison and Clustering
Finding Coreference Between Concepts/People
Index
Customer Reviews