Book Image

Natural Language Processing with Java

By : Richard M. Reese , Richard M Reese
Book Image

Natural Language Processing with Java

By: Richard M. Reese , Richard M Reese

Overview of this book

Table of Contents (15 chapters)
Natural Language Processing with Java
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Techniques for name recognition


There are a number of NER techniques available. Some use regular expressions and others are based on a predefined dictionary. Regular expressions have a lot of expressive power and can isolate entities. A dictionary of entity names can be compared to tokens of text to find matches.

Another common NER approach uses trained models to detect their presence. These models are dependent on the type of entity we are looking for and the target language. A model that works well for one domain, such as web pages, may not work well for a different domain, such as medical journals.

When a model is trained, it uses an annotated block of text, which identifies the entities of interest. To measure how well a model has been trained, several measures are used:

  • Precision: It is the percentage of entities found that match exactly the spans found in the evaluation data

  • Recall: It is the percentage of entities defined in the corpus that were found in the same location

  • Performance...