Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 2. Finding and Working with Words

In this chapter, we cover the following recipes:

  • Introduction to tokenizer factories – finding words in a character stream

  • Combining tokenizers – lowercase tokenizer

  • Combining tokenizers – stop word tokenizers

  • Using Lucene/Solr tokenizers

  • Using Lucene/Solr tokenizers with LingPipe

  • Evaluating tokenizers with unit tests

  • Modifying tokenizer factories

  • Finding words for languages without white spaces