Default tagging provides a baseline for part-of-speech tagging. It simply assigns the same part-of-speech tag to every token. We do this using the DefaultTagger
class. This tagger is useful as a last-resort tagger, and provides a baseline to measure accuracy improvements.
We're going to use the treebank
corpus for most of this chapter because it's a common standard and is quick to load and test. But everything we do should apply equally well to brown
, conll2000
, and any other part-of-speech tagged corpus.
The DefaultTagger
class takes a single argument, the tag you want to apply. We'll give it NN
, which is the tag for a singular noun. DefaultTagger
is most useful when you choose the most common part-of-speech tag. Since nouns tend to be the most common types of words, a noun tag is recommended.
>>> from nltk.tag import DefaultTagger >>> tagger = DefaultTagger('NN') >>> tagger.tag(['Hello', 'World']) [('Hello', 'NN'), ('World...