In addition to UnigramTagger
, there are two more NgramTagger
subclasses: BigramTagger
and TrigramTagger
. The BigramTagger
subclass uses the previous tag as part of its context, while the TrigramTagger
subclass uses the previous two tags. An ngram is a subsequence of n items, so the BigramTagger
subclass looks at two items (the previous tagged word and the current word), and the TrigramTagger
subclass looks at three items.
These two taggers are good at handling words whose part-of-speech tag is context-dependent. Many words have a different part of speech depending on how they are used. For example, we've been talking about taggers that tag words. In this case, tag is used as a verb. But the result of tagging is a part-of-speech tag, so tag can also be a noun. The idea with the NgramTagger
subclasses is that by looking at the previous words and part-of-speech tags, we can better guess the part-of-speech tag for the current word. Internally, each tagger...