In many websites and blogs and certainly on web forums, you might see keyword highlighting that links pages you can buy a product from. Similarly, news websites also provide topic pages for people, places, and trending events, such as the one at http://www.nytimes.com/pages/topics/.
A lot of this is fully automated and is easy to do with a dictionary-based Chunker
. It is straightforward to compile lists of names for entities and their types. An exact dictionary chunker extracts chunks based on exact matches of tokenized dictionary entries.
The implementation of the dictionary-based chunker in LingPipe is based on the Aho-Corasick algorithm which finds all matches against a dictionary in linear time independent of the number of matches or size of the dictionary. This makes it much more efficient than the naïve approach of doing substring searches or using regular expressions.