Using the included names
corpus, we can create a simple tagger for tagging names as proper nouns.
The NamesTagger
class is a subclass of SequentialBackoffTagger
as it's probably only useful near the end of a backoff chain. At initialization, we create a set of all names in the names
corpus, lower-casing each name to make lookup easier. Then, we implement the choose_tag()
method, which simply checks whether the current word is in the names_set
list. If it is, we return the NNP
tag (which is the tag for proper nouns). If it isn't, we return None
, so the next tagger in the chain can tag the word. The following code can be found in taggers.py
:
from nltk.tag import SequentialBackoffTagger from nltk.corpus import names class NamesTagger(SequentialBackoffTagger): def __init__(self, *args, **kwargs): SequentialBackoffTagger.__init__(self, *args, **kwargs) self.name_set = set([n.lower() for n in names.words()]) def choose_tag(self, tokens, index,...