Book Image

Natural Language Processing: Python and NLTK

By : Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi
Book Image

Natural Language Processing: Python and NLTK

By: Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Overview of this book

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages. The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy. The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods. The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python. This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products: ? NTLK essentials by Nitin Hardeniya ? Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins ? Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur
Table of Contents (6 chapters)

Chapter 2. Replacing and Correcting Words

In this chapter, we will cover the following recipes:

  • Stemming words
  • Lemmatizing words with WordNet
  • Replacing words matching regular expressions
  • Removing repeating characters
  • Spelling correction with Enchant
  • Replacing synonyms
  • Replacing negations with antonyms

Introduction

In this chapter, we will go over various word replacement and correction techniques. The recipes cover the gamut of linguistic compression, spelling correction, and text normalization. All of these methods can be very useful for preprocessing text before search indexing, document classification, and text analysis.

Stemming words

Stemming is a technique to remove affixes from a word, ending up with the stem. For example, the stem of cooking is cook, and a good stemming algorithm knows that the ing suffix can be removed. Stemming is most commonly used by search engines for indexing words. Instead of storing all forms of a word, a search engine can store only the stems, greatly...