Unigram means a single word. In a unigram tagger, a single token is used to find the particular parts-of-speech tag.
Training of UnigramTagger
can be performed by providing it with a list of sentences at the time of initialization.
Let's see the following code in NLTK, which performs UnigramTagger
training:
>>> import nltk >>> from nltk.tag import UnigramTagger >>> from nltk.corpus import treebank >>> training= treebank.tagged_sents()[:7000] >>> unitagger=UnigramTagger(training) >>> treebank.sents()[0] ['Pierre', 'Vinken', ',', '61', 'years', 'old', ',', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'Nov.', '29', '.'] >>> unitagger.tag(treebank.sents()[0]) [('Pierre', 'NNP'), ('Vinken', 'NNP'), (',', ','), ('61', 'CD'), ('years', 'NNS'), ('old', 'JJ'), (',', ','), ('will', 'MD'), ('join', 'VB'), ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'),...