Book Image

Python 2.6 Text Processing: Beginners Guide

By : Jeff McNeil
Book Image

Python 2.6 Text Processing: Beginners Guide

By: Jeff McNeil

Overview of this book

<p>For programmers, working with text is not about reading their newspaper on a break; it's about taking textual data in one form and doing something to it. Extract, decrypt, parse, restructure – these are just some of the text tasks that can occupy much of a programmer's life. If this is your life, this book will make it better – a practical guide on how to do what you want with textual data in Python.</p> <p><em>Python 2.6 Text Processing Beginner's Guide</em> is the easiest way to learn how to manipulate text with Python. Packed with examples, it will teach you text processing techniques and give you the skills to work with the most popular Python libraries for transforming text from one form to another.</p> <p>The book gets you going with a quick look at some data formats, and installing the supporting libraries and components so that you're ready to get started. You move on to extracting text from a collection of sources and handling it using Python's built-in string functions and regular expressions. You look into processing structured text documents such as XML and HTML, JSON, and CSV. Then you progress to generating documents and creating templates. Finally you look at ways to enhance text output via a collection of third-party packages such as Nucular, PyParsing, NLTK, and Mako.</p>
Table of Contents (20 chapters)
Python 2.6 Text Processing Beginner's Guide
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Time for action – configuring a virtual environment


Here, we'll enable the virtualenv package, which will illustrate how to install packages from the PyPI site. We'll also configure our first environment, which we'll use throughout the book for the rest of our examples and code illustrations.

  1. As a user with administrative privileges, install virtualenv from the system command line by running easy_install virtualenv. If you have the correct permissions, your output should be similar to the following.

    Searching for virtualenv
    Reading http://pypi.python.org/simple/virtualenv/
    Reading http://virtualenv.openplans.org
    Best match: virtualenv 1.4.5
    Downloading http://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.4.5.tar.gz#md5=d3c621dd9797789fef78442e336df63e
    Processing virtualenv-1.4.5.tar.gz
    Running virtualenv-1.4.5/setup.py -q bdist_egg --dist-dir /tmp/easy_install-rJXhVC/virtualenv-1.4.5/egg-dist-tmp-AvWcd1
    warning: no previously-included files matching '*.*' found under directory 'docs/_templates'
    Adding virtualenv 1.4.5 to easy-install.pth file
    Installing virtualenv script to /usr/bin
    
    Installed /usr/lib/python2.6/site-packages/virtualenv-1.4.5-py2.6.egg
    Processing dependencies for virtualenv
    Finished processing dependencies for virtualenv
    
  2. Drop administrative privileges as we won't need them any longer. Ensure that you're within your home directory and create a new virtual instance by running:

     $ virtualenv --no-site-packages text_processing
    
  3. Step into the newly created text_processing directory and activate the virtual environment. Windows users will do this by simply running the Scripts\activate application, while Linux users must instead source the script using the shell's dot operator.

    $ . bin/activate
    
  4. If you've done this correctly, you should now see your command-line prompt change to include the string (text_processing). This serves as a visual cue to remind you that you're operating within a specific virtual environment.

    (text_processing)$ pwd
    /home/jmcneil/text_processing
    (text_processing)$ which python
    /home/jmcneil/text_processing/bin/python
    (text_processing)$
    
  5. Finally, deactivate the environment by running the deactivate command. This will return your shell environment to default. Note that once you've done this, you're once again working with the system's Python install.

    (text_processing)$ deactivate
    $ which python
    /usr/bin/python
    $ 
    

Note

If you're running Windows, by default python.exe and easy_install.exe are not placed on your system %PATH%. You'll need to manually configure your %PATH% variable to include C:\Python2.6\ and C:\Python2.6\Scripts. Additional scripts added by easy_install will also be placed in this directory, so it's worth setting up your %PATH% variable.

What just happened?

We installed the virtualenv package using the easy_install command directly off of the Python Package index. This is the method we'll use for installing any third-party packages going forward. You should now be familiar with the easy_install process. Also, note that for the remainder of the book, we'll operate from within this text_processing virtual environment. Additional packages are installed using this same technique from within the confines of our environment.

After the install process was completed, we configured and activated our first virtual environment. You saw how to create a new instance via the virtualenv command and you also learned how to subsequently activate it using the bin/activate script. Finally, we showed you how to deactivate your environment and return to your system's default state.

Have a go hero – install your own environment

Now that you know how to set up your own isolated Python environment, you're encouraged to create a second one and install a collection of third-party utilities in order to get the hang of the installation process.

  1. Create a new environment and name it as of your own choice.

  2. Point your browser to http://pypi.python.org and select one or more packages that you find interesting. Install them via the easy_install command within your new virtual environment.

Note that you should not require administrative privileges to do this. If you receive an error about permissions, make certain you've remembered to activate your new environment. Deactivate when complete. Some of the packages available for install may require a correctly configured C-language compiler.