Book Image

Natural Language Processing with Java and LingPipe Cookbook

Book Image

Natural Language Processing with Java and LingPipe Cookbook

Overview of this book

Table of Contents (14 chapters)
Natural Language Processing with Java and LingPipe Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Train a little, learn a little – active learning


Active learning is a super power to quickly develop classifiers. It has saved many a project in the real world. The idea is very simple and can be broken down as follows:

  1. Assemble a packet of raw data that is way bigger than you can annotate manually.

  2. Annotate an embarrassingly small amount of the raw data.

  3. Train the classifier on the embarrassingly small amount of training data.

  4. Run the trained classifier on all the data.

  5. Put the classifier output into a .csv file ranked by confidence of best category.

  6. Correct another embarrassingly small amount of data, starting with the most confident classifications.

  7. Evaluate the performance.

  8. Repeat the process until the performance is acceptable, or you run out of data.

  9. If successful, be sure to evaluate/threshold on fresh data, because the active learning process can introduce biases to the evaluation.

What this process does is help the classifier distinguish the cases where it is making higher confidence mistakes...