Natural Language Processing: Python and NLTK

By : Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Natural Language Processing: Python and NLTK

By: Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Overview of this book

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages. The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy. The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods. The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python. This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products: ? NTLK essentials by Nitin Hardeniya ? Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins ? Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Free Chapter

1. Module 1

1. Introduction to Natural Language Processing

2. Text Wrangling and Cleansing

3. Part of Speech Tagging

4. Parsing Structure in Text

5. NLP Applications

6. Text Classification

7. Web Crawling

8. Using NLTK with Other Python Libraries

9. Social Media Mining in Python

10. Text Mining at Scale

2. Module 2

1. Tokenizing Text and WordNet Basics

2. Replacing and Correcting Words

3. Creating Custom Corpora

4. Part-of-speech Tagging

5. Extracting Chunks

6. Transforming Chunks and Trees

7. Text Classification

8. Distributed Processing and Handling Large Datasets

9. Parsing Specific Data Types

A. Penn Treebank Part-of-speech Tags

3. Module 3

1. Working with Strings

2. Statistical Language Modeling

3. Morphology – Getting Our Feet Wet

4. Parts-of-Speech Tagging – Identifying Words

5. Parsing – Analyzing Training Data

6. Semantic Analysis – Meaning Matters

7. Sentiment Analysis – I Am Happy

8. Information Retrieval – Accessing Information

9. Discourse Analysis – Knowing Is Believing

10. Evaluation of NLP Systems – Analyzing Performance

B. Bibliography

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 9. Parsing Specific Data Types

In this chapter, we will cover the following recipes:

Parsing dates and times with dateutil
Timezone lookup and conversion
Extracting URLs from HTML with lxml
Cleaning and stripping HTML
Converting HTML entities with BeautifulSoup
Detecting and converting character encodings

Introduction

This chapter covers parsing specific kinds of data, focusing primarily on dates, times, and HTML. Luckily, there are a number of useful libraries to accomplish this, so we don't have to delve into tricky and overly complicated regular expressions. These libraries can be great complements to NLTK:

dateutil provides datetime parsing and timezone conversion
lxml and BeautifulSoup can parse, clean, and convert HTML
charade and UnicodeDammit can detect and convert text character encoding

These libraries can be useful for preprocessing text before passing it to an NLTK object, or postprocessing text that has been processed and extracted using NLTK. Coming up is an example...

Natural Language Processing: Python and NLTK

By : Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Natural Language Processing: Python and NLTK

By: Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Overview of this book

Related Content you might be interested in

Current Title:

Natural Language Processing: Python and NLTK

Chapter 9. Parsing Specific Data Types

Introduction