A unigram generally refers to a single token. Therefore, a unigram tagger only uses a single word as its context for determining the part-of-speech tag.
UnigramTagger
inherits from NgramTagger
, which is a subclass of ContextTagger
, which inherits from SequentialBackoffTagger
. In other words, UnigramTagger
is a context-based tagger whose context is a single word, or unigram.
UnigramTagger
can be trained by giving it a list of tagged sentences at initialization.
>>> from nltk.tag import UnigramTagger >>> from nltk.corpus import treebank >>> train_sents = treebank.tagged_sents()[:3000] >>> tagger = UnigramTagger(train_sents) >>> treebank.sents()[0] ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.'] >>> tagger.tag(treebank.sents()[0]) [('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD...