Named entity recognition
is a specific kind of chunk extraction that uses entity tags instead of, or in addition to, chunk tags. Common entity tags include PERSON
, ORGANIZATION
, and LOCATION
. Part-of-speech tagged sentences are parsed into chunk trees as with normal chunking, but the nodes of the trees can be entity tags instead of chunk phrase tags.
NLTK comes with a pre-trained named entity chunker. This chunker has been trained on data from the ACE program, a NIST (National Institute of Standards and Technology) sponsored program for Automatic Content Extraction, which you can read more about here: http://www.itl.nist.gov/iad/894.01/tests/ace/. Unfortunately, this data is not included in the NLTK corpora, but the trained chunker is. This chunker can be used through the ne_chunk()
method in the nltk.chunk
module. ne_chunk()
will chunk a single sentence into a Tree
. The following is an example using ne_chunk()
on the first tagged sentence of the...