Book Image

Python Text Processing with NLTK 2.0 Cookbook

By : Jacob Perkins
Book Image

Python Text Processing with NLTK 2.0 Cookbook

By: Jacob Perkins

Overview of this book

<p>Natural Language Processing is used everywhere – in search engines, spell checkers, mobile phones, computer games – even your washing machine. Python's Natural Language Toolkit (NLTK) suite of libraries has rapidly emerged as one of the most efficient tools for Natural Language Processing. You want to employ nothing less than the best techniques in Natural Language Processing – and this book is your answer.<br /><br /><em>Python Text Processing with NLTK 2.0 Cookbook</em> is your handy and illustrative guide, which will walk you through all the Natural Language Processing techniques in a step–by-step manner. It will demystify the advanced features of text analysis and text mining using the comprehensive NLTK suite.<br /><br />This book cuts short the preamble and you dive right into the science of text processing with a practical hands-on approach.<br /><br />Get started off with learning tokenization of text. Get an overview of WordNet and how to use it. Learn the basics as well as advanced features of Stemming and Lemmatization. Discover various ways to replace words with simpler and more common (read: more searched) variants. Create your own corpora and learn to create custom corpus readers for JSON files as well as for data stored in MongoDB. Use and manipulate POS taggers. Transform and normalize parsed chunks to produce a canonical form without changing their meaning. Dig into feature extraction and text classification. Learn how to easily handle huge amounts of data without any loss in efficiency or speed.<br /><br />This book will teach you all that and beyond, in a hands-on learn-by-doing manner. Make yourself an expert in using the NLTK for Natural Language Processing with this handy companion.</p>
Table of Contents (16 chapters)
Python Text Processing with NLTK 2.0 Cookbook
Credits
About the Author
About the Reviewers
Preface
Penn Treebank Part-of-Speech Tags
Index

About the Author

Jacob Perkins has been an avid user of open source software since high school, when he first built his own computer and didn't want to pay for Windows. At one point he had five operating systems installed, including Red Hat Linux, OpenBSD, and BeOS.

While at Washington University in St. Louis, Jacob took classes in Spanish and poetry writing, and worked on an independent study project that eventually became his Master's project: WUGLE—a GUI for manipulating logical expressions. In his free time, he wrote the Gnome2 version of Seahorse (a GUI for encryption and key management), which has since been translated into over a dozen languages and is included in the default Gnome distribution.

After receiving his MS in Computer Science, Jacob tried to start a web development studio with some friends, but since no one knew anything about web development, it didn't work out as planned. Once he'd actually learned about web development, he went off and co-founded another company called Weotta, which sparked his interest in Machine Learning and Natural Language Processing.

Jacob is currently the CTO/Chief Hacker for Weotta and blogs about what he's learned along the way at http://streamhacker.com/. He is also applying this knowledge to produce text processing APIs and demos at http://text-processing.com/. This book is a synthesis of his knowledge on processing text using Python, NLTK, and more.