Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Have you been doing the same old monotonous office work over and over again? Or have you been trying to find an easy way to make your life better by automating some of your repetitive tasks? Through a tried and tested approach, understand how to automate all the boring stuff using Python. The Python Automation Cookbook helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. You’ll learn how to get notifications via text messages and run tasks while your mind is focused on other important activities, followed by understanding how to scan documents such as résumés. Once you’ve gotten familiar with the fundamentals, you’ll be introduced to the world of graphs, along with studying how to produce organized charts using Matplotlib. In addition to this, you’ll gain in-depth knowledge of how to generate rich graphics showing relevant information. By the end of this book, you’ll have refined your skills by attaining a sound understanding of how to identify and correct problems to produce superior and reliable systems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

Let Us Begin Our Automation Journey

Introduction

Creating a virtual environment

Installing third-party packages

Creating strings with formatted values

Manipulating strings

Extracting data from structured strings

Using a third-party tool—parse

Introducing regular expressions

Going deeper into regular expressions

Adding command-line arguments

Automating Tasks Made Easy

Introduction

Preparing a task

Setting up a cron job

Capturing errors and problems

Sending email notifications

Building Your First Web Scraping Application

Introduction

Downloading web pages

Interacting with forms

Using Selenium for advanced interaction

Accessing password-protected pages

Speeding up web scraping

Searching and Reading Local Files

Introduction

Crawling and searching directories

Reading text files

Dealing with encodings

Reading CSV files

Reading log files

Reading file metadata

Reading images

Reading PDF files

Reading Word documents

Scanning documents for a keyword

Generating Fantastic Reports

Introduction

Creating a simple report in plain text

Using templates for reports

Formatting text in Markdown

Writing a basic Word document

Styling a Word document

Generating structure in Word documents

Adding pictures to Word documents

Writing a simple PDF document

Structuring a PDF

Aggregating PDF reports

Watermarking and encrypting a PDF

Fun with Spreadsheets

Introduction

Writing a CSV spreadsheet

Updating the CSV files

Reading an Excel spreadsheet

Updating an Excel spreadsheet

Creating new sheets on an Excel spreadsheet

Creating charts in Excel

Working with format in Excel

Creating a macro in LibreOffice

Developing Stunning Graphs

Introduction

Plotting a simple sales graph

Drawing stacked bars

Plotting pie charts

Displaying multiple lines

Drawing a scatter plot

Visualizing maps

Adding legends and annotations

Combining graphs

Saving charts

Dealing with Communication Channels

Introduction

Working with email templates

Sending an individual email

Reading an email

Adding subscribers to an email newsletter

Sending notifications via email

Producing SMS

Receiving SMS

Creating a Telegram bot

Why Not Automate Your Marketing Campaign?

Introduction

Detecting the opportunities

Creating personalized coupon codes

Sending a notification to the customer on their preferred channel

Preparing sales information

Generating a sales report

Debugging Techniques

Introduction

Learning Python interpreter basics

Debugging through logging

Debugging with breakpoints

Improving your debugging skills

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introducing regular expressions

A regular expression, or regex, is a pattern to match text. In other words, it allows us to define an abstract string (typically the definition of a structured kind of text) to check with other strings to see if they match or not.

It is better to describe them with an example. Think of defining a pattern of text as a word that starts with an uppercase A and contains only lowercase Ns and As after that. The word Anna matches it, but Bob, Alice, and James does not. The words Aaan, Ana, Annnn, and Aaaan will also be matches, but ANNA won't.

If this sounds complicated, that's because it is. Regexes can be notoriously complicated because they may be incredibly intricate and difficult to follow. But they are very useful, because they allow us to perform incredibly powerful pattern matching.

Some common uses of regexes are as follow:

Validating input data: For example, that a phone number is only numbers, dashes, and brackets.
String parsing: Retrieve data from structured strings, such as logs or URLs. This is similar to what's described in the previous recipe.
Scrapping: Find the occurrences of something in a long text. For example, find all emails in a web page.
Replacement: Find and replace a word or words with others. For example, replace the owner with John Smith.

"Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."
– Jamie Zawinski

Regular expressions are at their best when they are kept very simple. In general, if there is a specific tool to do it, prefer it over regexes. A very clear example of this is HTML parsing; check Chapter 3, Building Your First Web Scraping Application, for better tools to achieve this.

Some text editors allow us to search using regexes as well. While most are editors aimed at writing code, such as Vim, BBEdit, or Notepad++, they're also present in more general tools, such as MS Office, Open Office, or Google Documents. But be careful, as the particular syntax may be slightly different.

Getting ready

The python module to deal with regexes is called re. The main function we'll cover is re.search(), which returns a match object with information about what matched the pattern.

As regex patterns are also defined as strings, we'll differentiate them by prefixing them with an r, such as r'pattern'. This is the Python way of labeling a text as raw string literals, meaning that the string within is taken literally, without any escaping. This means that a \ is used as a backslash instead of a sequence. For example, without the r prefix, \n means newline character.

Some characters are special, and refer to concepts such as the end of the string, any digit, any character, any whitespace character, and so on.

The simplest form is just a literal string. For example, the regex pattern r'LOG' matches the string 'LOGS', but not the string 'NOT A MATCH'. If there's not a match, search returns None:

>>> import re
>>> re.search(r'LOG', 'LOGS')
<_sre.SRE_Match object; span=(0, 3), match='LOG'>
>>> re.search(r'LOG', 'NOT A MATCH')
>>>

How to do it...

Import the re module:

>>> import re

Then, match a pattern that is not at the start of the string:

>>> re.search(r'LOG', 'SOME LOGS')
<_sre.SRE_Match object; span=(5, 8), match='LOG'>

Match a pattern that is only at the start of the string. Note the ^ character:

>>> re.search(r'^LOG', 'LOGS')
<_sre.SRE_Match object; span=(0, 3), match='LOG'>
>>> re.search(r'^LOG', 'SOME LOGS')
>>>

Match a pattern only at the end of the string. Note the $ character:

>>> re.search(r'LOG$', 'SOME LOG')
<_sre.SRE_Match object; span=(5, 8), match='LOG'>
>>> re.search(r'LOG$', 'SOME LOGS')
>>>

Match the word 'thing' (not excluding things), but not something or anything. Note the \b at the start of the second pattern:

>>> STRING = 'something in the things she shows me'
>>> match = re.search(r'thing', STRING)
>>> STRING[:match.start()], STRING[match.start():match.end()], STRING[match.end():]
('some', 'thing', ' in the things she shows me')
>>> match = re.search(r'\bthing', STRING)
>>> STRING[:match.start()], STRING[match.start():match.end()], STRING[match.end():]
('something in the ', 'thing', 's she shows me')

Match a pattern that's only numbers and dashes (for example, a phone number). Retrieve the matched string:

>>> re.search(r'[0123456789-]+', 'the phone number is 1234-567-890')
<_sre.SRE_Match object; span=(20, 32), match='1234-567-890'>
>>> re.search(r'[0123456789-]+', 'the phone number is 1234-567-890').group()
'1234-567-890'

Match an email address naively:

>>> re.search(r'\S+@\S+', 'my email is [email protected]').group()
'[email protected]'

How it works...

The re.search function matches a pattern, no matter its position in the string. As explained previously, this will return None if the pattern is not found, or a match object.

The following special characters are used:

^: Marks the start of the string
$: Marks the end of the string

\b: Marks the start or end of a word
\S: Marks any character that's not a whitespace, including special characters

More special characters are shown in the next recipe.

In step 6 in the How to do it... section, the r'[0123456789-]+' pattern is composed of two parts. The first one is between square brackets, and matches any single character between 0 and 9 (any number) and the dash (-) character. The + sign after that means that this character can be present one or more times. This is called a quantifier in regexes. This makes a match on any combination of numbers and dashes, no matter how long it is.

Step 7 again uses the + sign to match as many characters as necessary before the @ and again after it. In this case, the character match is \S, which matches any non-whitespace character.

Please note that the naive pattern for emails described here is very naive, as it will match invalid emails such as john@[email protected]. A better regex for most uses is r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)". You can go to http://emailregex.com/ for find it and links to more information.

Note that parsing a valid email including corner cases is actually a difficult and challenging problem. The previous regex should be fine for most uses covered in this book, but in a general framework project such as Django, email validation is a very long and very unreadable regex.

The resulting matching object returns the position where the matched pattern starts and ends (using the start and end methods), as shown in step 5, which splits the string into matched parts, showing the distinction between the two matching patterns.

The difference displayed in step 5 is a very common one. Trying to capture GP can end up capturing eggplant and bagpipe! Similarly, things\b won't capture things. Be sure to test and make the proper adjustments, such as capturing \bGP\b for just the word GP.

The specific matched pattern can be retrieved by calling group(), as shown in step 6. Note that the result will always be a string. It can be further processed using any of the methods that we've previously seen, such as by splitting the phone number into groups by dashes, for example:

>>> match = re.search(r'[0123456789-]+', 'the phone number is 1234-567-890')
>>> [int(n) for n in match.group().split('-')]
[1234, 567, 890]

There's more...

Dealing with regexes can be difficult and complex. Please allow time to test your matches and be sure that they work as you expect in order to avoid nasty surprises.

You can check your regexes interactively with some tools. A good one that's freely available online is https://regex101.com/, which displays each of the elements and explains the regex. Double-check that you're using the Python flavor:

See that the EXPLANATION describes that \b matches a word boundary (start or end of a word), and that thing matches literally these characters.

Regexes, in some cases, can be very slow, or even produce what's called regex denial-of-service, a string created to confuse a particular regex so that it takes an enormous amount of time, even in the worst case blocking the computer. While automating tasks probably won't get you into those problems, keep an eye out in case a regex takes too long.

Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Related Content you might be interested in

Current Title:

Python Automation Cookbook

Automate it! - Recipes to upskill your business