Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Foreground- or background-driven interesting phrase detection


Like the previous recipe, this recipe finds interesting phrases, but it uses another language model to determine what is interesting. Amazon's statistically improbable phrases (SIP) work this way. You can get a clear view from their website at http://www.amazon.com/gp/search-inside/sipshelp.html:

"Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book.

SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!."

The foreground model will be the book being processed, and the background model will be all the other books in Amazon's Search Inside...