Book Image

Modern Python Cookbook

Book Image

Modern Python Cookbook

Overview of this book

Python is the preferred choice of developers, engineers, data scientists, and hobbyists everywhere. It is a great scripting language that can power your applications and provide great speed, safety, and scalability. By exposing Python as a series of simple recipes, you can gain insight into specific language features in a particular context. Having a tangible context helps make the language or standard library feature easier to understand. This book comes with over 100 recipes on the latest version of Python. The recipes will benefit everyone ranging from beginner to an expert. The book is broken down into 13 chapters that build from simple language concepts to more complex applications of the language. The recipes will touch upon all the necessary Python concepts related to data structures, OOP, functional programming, as well as statistical programming. You will get acquainted with the nuances of Python syntax and how to effectively use the advantages that it offers. You will end the book equipped with the knowledge of testing, web services, and configuration and application integration tips and tricks. The recipes take a problem-solution approach to resolve issues commonly faced by Python programmers across the globe. You will be armed with the knowledge of creating applications with flexible logging, powerful configuration, and command-line options, automated unit tests, and good documentation.
Table of Contents (18 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Reading HTML documents


A great deal of content on the Web is presented using HTML markup. A browser renders the data very nicely. How can we parse this data to extract the meaningful content from the displayed web page?

We can use the standard library html.parser module, but it's not helpful. It only provides low-level lexical scanning information, but doesn't provide a high-level data structure that describes the original web page.

We'll use the Beautiful Soup module to parse HTML pages. This is available from the Python Package Index (PyPI). See https://pypi.python.org/pypi/beautifulsoup4.

This must be downloaded and installed to be useful. Generally, the pip command does this job very nicely.

Often, this is as simple as the following:

pip install beautifulsoup4

For Mac OS X and Linux users, the sudo command is required to escalate the user's privileges:

sudo pip install beautifulsoup4

This will prompt for the user's password. The user must be able to elevate themselves to have root privileges...