Following is a table of all the part-of-speech tags that occur in the
treebank
corpus distributed with NLTK. The tags and counts shown here were acquired using the following code:
>>> from nltk.probability import FreqDist >>> from nltk.corpus import treebank >>> fd = FreqDist() >>> for word, tag in treebank.tagged_words(): ... fd.inc(tag) >>> fd.items()
The FreqDist fd
contains all the counts shown here for every tag in the treebank
corpus. You can inspect each tag count individually by doing fd[tag]
, as in fd['DT']
. Punctuation tags are also shown, along with special tags such as -NONE-
, which signifies that the part-of-speech tag is unknown. Descriptions of most of the tags can be found at http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.
Part-of-speech tag |
Frequency of occurrence |
---|---|
16 | |
724 | |
694 | |
4,886 | |
120 | |
6,592 | |
126 | |
384 | |
563 | |
712 | |
2,265 | |
3,546 | |
8,165 | |
88 | |
4 | |
9,857 | |
5,834 | |
381 | |
182 | |
13 | |
927 | |
13,166 | |
9,410 | |
244 | |
6,047 | |
27 | |
824 | |
1,716 | |
766 | |
2,822 | |
136 | |
35 | |
216 | |
1 | |
2,179 | |
3 | |
2,554 | |
3,043 | |
1,460 | |
2,134 | |
1,321 | |
2,125 | |
445 | |
241 | |
14 |