Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Have you been doing the same old monotonous office work over and over again? Or have you been trying to find an easy way to make your life better by automating some of your repetitive tasks? Through a tried and tested approach, understand how to automate all the boring stuff using Python. The Python Automation Cookbook helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. You’ll learn how to get notifications via text messages and run tasks while your mind is focused on other important activities, followed by understanding how to scan documents such as résumés. Once you’ve gotten familiar with the fundamentals, you’ll be introduced to the world of graphs, along with studying how to produce organized charts using Matplotlib. In addition to this, you’ll gain in-depth knowledge of how to generate rich graphics showing relevant information. By the end of this book, you’ll have refined your skills by attaining a sound understanding of how to identify and correct problems to produce superior and reliable systems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

Let Us Begin Our Automation Journey

Introduction

Creating a virtual environment

Installing third-party packages

Creating strings with formatted values

Manipulating strings

Extracting data from structured strings

Using a third-party tool—parse

Introducing regular expressions

Going deeper into regular expressions

Adding command-line arguments

Automating Tasks Made Easy

Introduction

Preparing a task

Setting up a cron job

Capturing errors and problems

Sending email notifications

Building Your First Web Scraping Application

Introduction

Downloading web pages

Interacting with forms

Using Selenium for advanced interaction

Accessing password-protected pages

Speeding up web scraping

Searching and Reading Local Files

Introduction

Crawling and searching directories

Reading text files

Dealing with encodings

Reading CSV files

Reading log files

Reading file metadata

Reading images

Reading PDF files

Reading Word documents

Scanning documents for a keyword

Generating Fantastic Reports

Introduction

Creating a simple report in plain text

Using templates for reports

Formatting text in Markdown

Writing a basic Word document

Styling a Word document

Generating structure in Word documents

Adding pictures to Word documents

Writing a simple PDF document

Structuring a PDF

Aggregating PDF reports

Watermarking and encrypting a PDF

Fun with Spreadsheets

Introduction

Writing a CSV spreadsheet

Updating the CSV files

Reading an Excel spreadsheet

Updating an Excel spreadsheet

Creating new sheets on an Excel spreadsheet

Creating charts in Excel

Working with format in Excel

Creating a macro in LibreOffice

Developing Stunning Graphs

Introduction

Plotting a simple sales graph

Drawing stacked bars

Plotting pie charts

Displaying multiple lines

Drawing a scatter plot

Visualizing maps

Adding legends and annotations

Combining graphs

Saving charts

Dealing with Communication Channels

Introduction

Working with email templates

Sending an individual email

Reading an email

Adding subscribers to an email newsletter

Sending notifications via email

Producing SMS

Receiving SMS

Creating a Telegram bot

Why Not Automate Your Marketing Campaign?

Introduction

Detecting the opportunities

Creating personalized coupon codes

Sending a notification to the customer on their preferred channel

Preparing sales information

Generating a sales report

Debugging Techniques

Introduction

Learning Python interpreter basics

Debugging through logging

Debugging with breakpoints

Improving your debugging skills

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Going deeper into regular expressions

In this recipe, we'll see more about how to deal with regular expressions. After introducing the basics, we will dig a little deeper into pattern elements, introduce groups as a better way to retrieve and parse strings, see how to search for multiple occurrences of the same string, and deal with longer texts.

How to do it...

Import re:

>>> import re

Match a phone pattern as part of a group (in brackets). Note the use of \d as a special character for any digit:

>>> match = re.search(r'the phone number is ([\d-]+)', '37: the phone number is 1234-567-890')
>>> match.group()
'the phone number is 1234-567-890'
>>> match.group(1)
'1234-567-890'

Compile a pattern and capture a case insensitive pattern with a yes|no option:

>>> pattern = re.compile(r'The answer to question (\w+) is (yes|no)', re.IGNORECASE)
>>> pattern.search('Naturaly, the answer to question 3b is YES')
<_sre.SRE_Match object; span=(10, 42), match='the answer to question 3b is YES'>
>>> _.groups()
('3b', 'YES')

Match all the occurrences of cities and state abbreviations in the text. Note that they are separated by a single character and the name of the city always starts with an uppercase letter. Only four states are matched for simplicity:

>>> PATTERN = re.compile(r'([A-Z][\w\s]+).(TX|OR|OH|MI)')
>>> TEXT ='the jackalopes are the team of Odessa,TX while the knights are native of Corvallis OR and the mud hens come from Toledo.OH; the whitecaps have their base in Grand Rapids,MI'
>>> list(PATTERN.finditer(TEXT))
[<_sre.SRE_Match object; span=(31, 40), match='Odessa,TX'>, <_sre.SRE_Match object; span=(73, 85), match='Corvallis OR'>, <_sre.SRE_Match object; span=(113, 122), match='Toledo.OH'>, <_sre.SRE_Match object; span=(157, 172), match='Grand Rapids,MI'>]
>>> _[0].groups()
('Odessa', 'TX')

How it works...

The new special characters that were introduced are as follows. Note that the same letter in uppercase or lowercase means the opposite match, for example \d matches a digit, while \D matches a non digit.:

\d: Marks any digit (0 to 9).
\s: Marks any character that's a whitespace, including tabs and other whitespace special characters. Note that this is the reverse of \S, introduced in the previous recipe.
\w: Marks any letter (includes digits, but excludes characters such as periods).
.: Marks any character.

To define groups, put the defined groups in brackets. Groups can be retrieved individually, making them perfect for matching a bigger pattern that contains a variable part that we'll treat later, as demonstrated in step 2. Note the difference with the step 6 pattern in the previous recipe. In this case, the pattern is not only the number, but includes the prefix, even if we then extract the number. Check out this difference, where there's a number that's not the number we want to capture:

>>> re.search(r'the phone number is ([\d-]+)', '37: the phone number is 1234-567-890')
<_sre.SRE_Match object; span=(4, 36), match='the phone number is 1234-567-890'>
>>> _.group(1)
'1234-567-890'
>>> re.search(r'[0123456789-]+', '37: the phone number is 1234-567-890')
<_sre.SRE_Match object; span=(0, 2), match='37'>
>>> _.group()
'37'

Remember that group 0 (.group() or .group(0)) is always the whole match. The rest of the groups are ordered as they appear.

Patterns can be compiled as well. This saves some time if the pattern needs to be matched over and over. To use it that way, compile the pattern and then use that object to perform searches, as shown in steps 3 and 4. Some extra flags can be added, such as making the pattern case insensitive.

Step 4's pattern requires a little bit of information. It's composed of two groups, separated by a single character. The special character . means it matches everything, in our example a period, a whitespace, and a comma. The second group is a straightforward selection of defined options, in this case US state abbreviations.

The first group starts with an uppercase letter ([A-Z]), and accepts any combination of letters or spaces ([\w\s]+), but not punctuation marks such as periods or commas. This matches the cities, including when composed of more than one word.

Note that this pattern starts on any uppercase letter and keeps matching until finding a state, unless separated by a punctuation mark, which may not be what's expected, for example:

>>> re.search(r'([A-Z][\w\s]+).(TX|OR|OH|MI)', 'This is a test, Escanaba MI')
<_sre.SRE_Match object; span=(16, 27), match='Escanaba MI'>
>>> re.search(r'([A-Z][\w\s]+).(TX|OR|OH|MI)', 'This is a test with Escanaba MI')
<_sre.SRE_Match object; span=(0, 31), match='This is a test with Escanaba MI'>

Step 4 also shows how to find more than one occurrence in a long text. While the .findall() method exists, it doesn't return the full match object, while .findalliter() does. Commonplace now in Python 3, .findalliter() returns an iterator that can be used in a for loop or list comprehension. Note that .search() returns only the first occurrence of the pattern, even if more matches appear:

>>> PATTERN.search(TEXT)
<_sre.SRE_Match object; span=(31, 40), match='Odessa,TX'>
>>> PATTERN.findall(TEXT)
[('Odessa', 'TX'), ('Corvallis', 'OR'), ('Toledo', 'OH')]

There's more...

The special characters can be reversed if they are case swapped. For example, the reverse of the ones we used are as follows:

\D: Marks any non-digit
\W: Marks any non-letter
\B: Marks any character that's not at the start or end of a word

The most commonly used special characters are typically \d (digits) and \w (letters and digits), as they mark common patterns to search for, and the plus sign for one or more.

Groups can be assigned names as well. This makes them more explicit at the expense of making the group more verbose in the following shape—(?P<groupname>PATTERN). Groups can be referred to by name with .group(groupname) or by calling .groupdict() while maintaining its numeric position.

For example, the step 4 pattern can be described as follows:

>>> PATTERN = re.compile(r'(?P<city>[A-Z][\w\s]+?).(?P<state>TX|OR|OH|MN)')
>>> match = PATTERN.search(TEXT)
>>> match.groupdict()
{'city': 'Odessa', 'state': 'TX'}
>>> match.group('city')
'Odessa'
>>> match.group('state')
'TX'
>>> match.group(1), match.group(2)
('Odessa', 'TX')

Regular expressions are a very extensive topic. There are whole technical books devoted to them and they can be notoriously deep. The Python documentation is good to be used as reference (https://docs.python.org/3/library/re.html) and to learn more.

If you feel a little intimidated at the start, it's a perfectly natural feeling. Analyze each of the patterns with care, dividing it into different parts, and they will start to make sense. Don't be afraid to run a regex interactive analyzer!

Regexes can be really powerful and generic, but they may not be the proper tool for what you are trying to achieve. We've seen some caveats and patterns that have subtleties. As a rule of thumb, if a pattern starts to feel complicated, it's time to search for a different tool. Remember the previous recipes as well and the options they presented, such as parse.

Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Related Content you might be interested in

Current Title:

Python Automation Cookbook

Automate it! - Recipes to upskill your business