Book Image

Python 2.6 Text Processing: Beginners Guide

By : Jeff McNeil
Book Image

Python 2.6 Text Processing: Beginners Guide

By: Jeff McNeil

Overview of this book

<p>For programmers, working with text is not about reading their newspaper on a break; it's about taking textual data in one form and doing something to it. Extract, decrypt, parse, restructure – these are just some of the text tasks that can occupy much of a programmer's life. If this is your life, this book will make it better – a practical guide on how to do what you want with textual data in Python.</p> <p><em>Python 2.6 Text Processing Beginner's Guide</em> is the easiest way to learn how to manipulate text with Python. Packed with examples, it will teach you text processing techniques and give you the skills to work with the most popular Python libraries for transforming text from one form to another.</p> <p>The book gets you going with a quick look at some data formats, and installing the supporting libraries and components so that you're ready to get started. You move on to extracting text from a collection of sources and handling it using Python's built-in string functions and regular expressions. You look into processing structured text documents such as XML and HTML, JSON, and CSV. Then you progress to generating documents and creating templates. Finally you look at ways to enhance text output via a collection of third-party packages such as Nucular, PyParsing, NLTK, and Mako.</p>
Table of Contents (20 chapters)
Python 2.6 Text Processing Beginner's Guide
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Supporting third-party modules


Now that we've got our first example out of the way, we're going to take a little bit of a detour and learn how to obtain and install third-party modules. This is important, as we'll install a few throughout the remainder of the book.

The Python community maintains a centralized package repository, termed the Python Package Index (or PyPI). It is available on the web at http://pypi.python.org. From there, it is possible to download packages as compressed source distributions, or in some cases, pre-packaged Python components. PyPI is also a rich source of information. It's a great place to learn about available third-party applications. Links are provided to individual package documentation if it's not included directly into the package's PyPI page.

Packaging in a nutshell

There are at least two different popular methods of packaging and deploying Python packages. The distutils package is part of the standard distribution and provides a mechanism for building and installing Python software. Packages that take advantage of the distutils system are downloaded as a source distribution and built and installed by a local user. They are installed by simply creating an additional directory structure within the system Python directory that matches the package name.

In an effort to make packages more accessible and self-contained, the concept of the Python Egg was introduced. An egg file is simply a ZIP archive of a package. When an egg is installed, the ZIP file itself is placed on the Python path, rather than a subdirectory.