Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Appendix A. Penn Treebank Part-of-speech Tags

The following is a table of all the part-of-speech tags that occur in the treebank corpus distributed with NLTK. The tags and counts shown here were acquired using the following code:

>>> from nltk.probability import FreqDist
>>> from nltk.corpus import treebank
>>> fd = FreqDist()
>>> for word, tag in treebank.tagged_words():
...	   fd[tag] += 1
>>> fd.items()

The FreqDist fd contains all the counts shown here for every tag in the treebank corpus. You can inspect each tag count individually, by doing fd[tag], for example, fd['DT']. Punctuation tags are also shown, along with special tags such as -NONE-, which signifies that the part-of-speech tag is unknown. Descriptions of most of the tags can be found at the following link:

http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Part-of-speech tag

Frequency of occurrence

#

16

$

724

''

694

,

4886

-LRB-

120

-NONE-

6592

-RRB-

126

.

384

:

563

''

712

CC

2265

CD

3546

DT

8165

EX

88

FW

4

IN

9857

JJ

5834

JJR

381

JJS

182

LS

13

MD

927

NN

13166

NNP

9410

NNPS

244

NNS

6047

PDT

27

POS

824

PRP

1716

PRP$

766

RB

2822

RBR

136

RBS

35

RP

216

SYM

1

TO

2179

UH

3

VB

2554

VBD

3043

VBG

1460

VBN

2134

VBP

1321

VBZ

2125

WDT

445

WP

241

WP$

14

WRB

178