Natural Language Processing: Python and NLTK

By : Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Natural Language Processing: Python and NLTK

By: Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Overview of this book

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages. The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy. The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods. The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python. This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products: ? NTLK essentials by Nitin Hardeniya ? Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins ? Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Free Chapter

1. Module 1

1. Introduction to Natural Language Processing

2. Text Wrangling and Cleansing

3. Part of Speech Tagging

4. Parsing Structure in Text

5. NLP Applications

6. Text Classification

7. Web Crawling

8. Using NLTK with Other Python Libraries

9. Social Media Mining in Python

10. Text Mining at Scale

2. Module 2

1. Tokenizing Text and WordNet Basics

2. Replacing and Correcting Words

3. Creating Custom Corpora

4. Part-of-speech Tagging

5. Extracting Chunks

6. Transforming Chunks and Trees

7. Text Classification

8. Distributed Processing and Handling Large Datasets

9. Parsing Specific Data Types

A. Penn Treebank Part-of-speech Tags

3. Module 3

1. Working with Strings

2. Statistical Language Modeling

3. Morphology – Getting Our Feet Wet

4. Parts-of-Speech Tagging – Identifying Words

5. Parsing – Analyzing Training Data

6. Semantic Analysis – Meaning Matters

7. Sentiment Analysis – I Am Happy

8. Information Retrieval – Accessing Information

9. Discourse Analysis – Knowing Is Believing

10. Evaluation of NLP Systems – Analyzing Performance

B. Bibliography

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Appendix A. Penn Treebank Part-of-speech Tags

The following is a table of all the part-of-speech tags that occur in the treebank corpus distributed with NLTK. The tags and counts shown here were acquired using the following code:

>>> from nltk.probability import FreqDist
>>> from nltk.corpus import treebank
>>> fd = FreqDist()
>>> for word, tag in treebank.tagged_words():
...	   fd[tag] += 1
>>> fd.items()

The FreqDist fd contains all the counts shown here for every tag in the treebank corpus. You can inspect each tag count individually, by doing fd[tag], for example, fd['DT']. Punctuation tags are also shown, along with special tags such as -NONE-, which signifies that the part-of-speech tag is unknown. Descriptions of most of the tags can be found at the following link:

http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Part-of-speech tag	Frequency of occurrence
`#`	`16`
`$`	`724`
`'&apos...`

Natural Language Processing: Python and NLTK

By : Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Natural Language Processing: Python and NLTK

By: Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing: Python and NLTK

Appendix A. Penn Treebank Part-of-speech Tags