Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


This chapter will tell us how to work with spans of text that typically cover one or more words/tokens. The LingPipe API represents this unit of text as a chunk with corresponding chunkers that produce chunkings. The following is some text with character offsets indicated:

LingPipe is an API. It is written in Java.
012345678901234567890123456789012345678901
          1         2         3         4           

Chunking the preceding text into sentences will give us the following output:

Sentence start=0, end=18
Sentence start =20, end=41

Adding in a chunking for named entities adds entities for LingPipe and Java:

Organization start=0, end=7
Organization start=37, end=40

We can define the named-entity chunkings with respect to their offsets from the sentences that contain them; this will make no difference to LingPipe, but Java will be:

Organization start=17, end=20

This is the basic idea of chunks. There are lots of ways to make them.