Book Image

Python Automation Cookbook - Second Edition

By : Jaime Buelta
Book Image

Python Automation Cookbook - Second Edition

By: Jaime Buelta

Overview of this book

In this updated and extended version of Python Automation Cookbook, each chapter now comprises the newest recipes and is revised to align with Python 3.8 and higher. The book includes three new chapters that focus on using Python for test automation, machine learning projects, and for working with messy data. This edition will enable you to develop a sharp understanding of the fundamentals required to automate business processes through real-world tasks, such as developing your first web scraping application, analyzing information to generate spreadsheet reports with graphs, and communicating with automatically generated emails. Once you grasp the basics, you will acquire the practical knowledge to create stunning graphs and charts using Matplotlib, generate rich graphics with relevant information, automate marketing campaigns, build machine learning projects, and execute debugging techniques. By the end of this book, you will be proficient in identifying monotonous tasks and resolving process inefficiencies to produce superior and reliable systems.
Table of Contents (16 chapters)
14
Other Books You May Enjoy
15
Index

Parsing HTML

Downloading raw text or a binary file is a good starting point, but the main language of the web is HTML.

HTML is a structured language, defining different parts of a document such as headings and paragraphs. HTML is also hierarchical, defining sub-elements. The ability to parse raw text into a structured document is basically the ability to extract information automatically from a web page. For example, some text can be relevant if enclosed in certain HTML elements, such as a class div or after a heading h3 tag.

Getting ready

We'll use the excellent Beautiful Soup module to parse HTML text into a memory object that can be analyzed. We need to use the latest version of the beautifulsoup4 package to be compatible with Python 3. Add the package to your requirements.txt and install the dependencies in the virtual environment:

$ echo "beautifulsoup4==4.8.2" >> requirements.txt
$ pip install -r requirements.txt

How to do it...

...