Book Image

Python Automation Cookbook - Second Edition

By : Jaime Buelta
Book Image

Python Automation Cookbook - Second Edition

By: Jaime Buelta

Overview of this book

In this updated and extended version of Python Automation Cookbook, each chapter now comprises the newest recipes and is revised to align with Python 3.8 and higher. The book includes three new chapters that focus on using Python for test automation, machine learning projects, and for working with messy data. This edition will enable you to develop a sharp understanding of the fundamentals required to automate business processes through real-world tasks, such as developing your first web scraping application, analyzing information to generate spreadsheet reports with graphs, and communicating with automatically generated emails. Once you grasp the basics, you will acquire the practical knowledge to create stunning graphs and charts using Matplotlib, generate rich graphics with relevant information, automate marketing campaigns, build machine learning projects, and execute debugging techniques. By the end of this book, you will be proficient in identifying monotonous tasks and resolving process inefficiencies to produce superior and reliable systems.
Table of Contents (16 chapters)
14
Other Books You May Enjoy
15
Index

Reading Word documents

Word documents (.docx) are another common kind of document that stores mainly text. They are typically generated with Microsoft Office, but other tools also produce compatible files. They are probably the most common format to share files that need to be editable, but they are also common for distributing documents.

We'll see in this recipe how to extract text information from a Word document.

Getting ready

We'll use the python-docx module to read and process Word documents:

$ echo "python-docx==0.8.10" >> requirements.txt
$ pip install -r requirements.txt

We have prepared a test file, available in the GitHub Chapter04/documents directory, called document-1.docx, which we'll use in this recipe. Note that this document follows the same Lorem Ipsum pattern that was described in the test document for the Reading PDF files recipe.

How to do it...

  1. Import python-docx:
    >> import docx
    
    ...