Book Image

Modern Python Standard Library Cookbook

By : Alessandro Molina
Book Image

Modern Python Standard Library Cookbook

By: Alessandro Molina

Overview of this book

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python. The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data. By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.
Table of Contents (21 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Reading XML/HTML content


Reading HTML or XML files allows us to parse web pages' content and to read documents or configurations described in XML.

Python has a built-in XML parser, the ElementTree module which is perfect for parsing XML files, but when HTML is involved, it chokes quickly due to the various quirks of HTML.

Consider trying to parse the following HTML:

<html>
    <body class="main-body">
        <p>hi</p>
        <img><br>
        <input type="text" />
    </body>
</html>

You will quickly face errors:

xml.etree.ElementTree.ParseError: mismatched tag: line 7, column 6

Luckily, it's not too hard to adapt the parser to handle at least the most common HTML files, such as self-closing/void tags.

How to do it...

You need to perform the following steps for this recipe:

  1. ElementTree by default uses expat to parse documents, and then relies on xml.etree.ElementTree.TreeBuilder to build the DOM of the document.

We can replace XMLParser based...