Book Image

Getting Started with Beautiful Soup

By : Vineeth G Nair
Book Image

Getting Started with Beautiful Soup

By: Vineeth G Nair

Overview of this book

Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need without writing excess code for an application. It doesn't take much code to write an application using Beautiful Soup. Getting Started with Beautiful Soup is a practical guide to Beautiful Soup using Python. The book starts by walking you through the installation of each and every feature of Beautiful Soup using simple examples which include sample Python codes as well as diagrams and screenshots wherever required for better understanding. The book discusses the problems of how exactly you can get data out of a website and provides an easy solution with the help of a real website and sample code. Getting Started with Beautiful Soup goes over the different methods to install Beautiful Soup in both Linux and Windows systems. You will then learn about searching, navigating, content modification, encoding support, and output formatting with the help of examples and sample Python codes for each example so that you can try them out to get a better understanding. This book is a practical guide for scraping information from any website. If you want to learn how to efficiently scrape pages from websites, then this book is for you.
Table of Contents (15 chapters)
Getting Started with Beautiful Soup
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using Beautiful Soup without installation


The installation processes that we have discussed till now normally copy the module contents to a chosen installation directory. This varies from operating system to operating system and the path is normally /usr/local/lib/pythonX.Y/site-packages in Linux operating systems such as Debian and C:\PythonXY\Lib\site-packages in Windows (where X and Y represent the corresponding versions, such as Python 2.7). When we use import statements in the Python interpreter or as a part of a Python script, normally what the Python interpreter does is look in the predefined Python Path variable and look for the module in those directories. So, installing actually means copying the module contents into the predefined directory or copying this to some other location and adding the location into the Python path. The following method of using Beautiful Soup without going through the installation can be used in any operating system, such as Windows, Linux, or Mac OS X:

  1. Download the latest version of Beautiful Soup package from https://pypi.python.org/packages/source/b/beautifulsoup4/.

  2. Unzip the package.

  3. Copy the bs4 directory into the directory where we want to place all our Python Beautiful Soup scripts.

After we perform all the preceding steps, we are good to use Beautiful Soup. In order to import Beautiful Soup in this case, either we need to open the terminal in the directory where the bs4 directory exists or add this directory to the Python Path variable; otherwise, we will get the module not found error. This extra step is required because the method is specific to a project where the bs4 directory is included. But in the case of installing methods, as we have seen previously, Beautiful Soup will be available globally and can be used in any of the projects, and so the additional steps are not required.