Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By : Jacob Perkins
Book Image

Python 3 Text Processing with NLTK 3 Cookbook

By: Jacob Perkins

Overview of this book

Table of Contents (17 chapters)
Python 3 Text Processing with NLTK 3 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Penn Treebank Part-of-speech Tags
Index

Removing repeating characters


In everyday language, people are often not strictly grammatical. They will write things such as I looooooove it in order to emphasize the word love. However, computers don't know that "looooooove" is a variation of "love" unless they are told. This recipe presents a method to remove these annoying repeating characters in order to end up with a proper English word.

Getting ready

As in the previous recipe, we will be making use of the re module, and more specifically, backreferences. A backreference is a way to refer to a previously matched group in a regular expression. This will allow us to match and remove repeating characters.

How to do it...

We will create a class that has the same form as the RegexpReplacer class from the previous recipe. It will have a replace() method that takes a single word and returns a more correct version of that word, with the dubious repeating characters removed. This code can be found in replacers.py in the book's code bundle and is...