Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Have you been doing the same old monotonous office work over and over again? Or have you been trying to find an easy way to make your life better by automating some of your repetitive tasks? Through a tried and tested approach, understand how to automate all the boring stuff using Python. The Python Automation Cookbook helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. You’ll learn how to get notifications via text messages and run tasks while your mind is focused on other important activities, followed by understanding how to scan documents such as résumés. Once you’ve gotten familiar with the fundamentals, you’ll be introduced to the world of graphs, along with studying how to produce organized charts using Matplotlib. In addition to this, you’ll gain in-depth knowledge of how to generate rich graphics showing relevant information. By the end of this book, you’ll have refined your skills by attaining a sound understanding of how to identify and correct problems to produce superior and reliable systems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

Let Us Begin Our Automation Journey

Introduction

Creating a virtual environment

Installing third-party packages

Creating strings with formatted values

Manipulating strings

Extracting data from structured strings

Using a third-party tool—parse

Introducing regular expressions

Going deeper into regular expressions

Adding command-line arguments

Automating Tasks Made Easy

Introduction

Preparing a task

Setting up a cron job

Capturing errors and problems

Sending email notifications

Building Your First Web Scraping Application

Introduction

Downloading web pages

Interacting with forms

Using Selenium for advanced interaction

Accessing password-protected pages

Speeding up web scraping

Searching and Reading Local Files

Introduction

Crawling and searching directories

Reading text files

Dealing with encodings

Reading CSV files

Reading log files

Reading file metadata

Reading images

Reading PDF files

Reading Word documents

Scanning documents for a keyword

Generating Fantastic Reports

Introduction

Creating a simple report in plain text

Using templates for reports

Formatting text in Markdown

Writing a basic Word document

Styling a Word document

Generating structure in Word documents

Adding pictures to Word documents

Writing a simple PDF document

Structuring a PDF

Aggregating PDF reports

Watermarking and encrypting a PDF

Fun with Spreadsheets

Introduction

Writing a CSV spreadsheet

Updating the CSV files

Reading an Excel spreadsheet

Updating an Excel spreadsheet

Creating new sheets on an Excel spreadsheet

Creating charts in Excel

Working with format in Excel

Creating a macro in LibreOffice

Developing Stunning Graphs

Introduction

Plotting a simple sales graph

Drawing stacked bars

Plotting pie charts

Displaying multiple lines

Drawing a scatter plot

Visualizing maps

Adding legends and annotations

Combining graphs

Saving charts

Dealing with Communication Channels

Introduction

Working with email templates

Sending an individual email

Reading an email

Adding subscribers to an email newsletter

Sending notifications via email

Producing SMS

Receiving SMS

Creating a Telegram bot

Why Not Automate Your Marketing Campaign?

Introduction

Detecting the opportunities

Creating personalized coupon codes

Sending a notification to the customer on their preferred channel

Preparing sales information

Generating a sales report

Debugging Techniques

Introduction

Learning Python interpreter basics

Debugging through logging

Debugging with breakpoints

Improving your debugging skills

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Manipulating strings

A basic ability when dealing with text is to be able to properly manipulate that text. That means to be able to join it, split it into regular chunks, or change it to be uppercase or lowercase. We'll discuss more advanced methods for parsing text and separating it later, but in lots of cases it is useful to divide a paragraph into lines, sentences, or even words. Other times, words will have to have some characters removed or replaced with a canonical version to be able to compare it with a determined value.

Getting ready

We'll define a basic text to transform it into its main components, and then we'll reconstruct it. As an example, a report needs to be transformed into a new format to be sent via email.

The input format we'll use in this example will be this:

    AFTER THE CLOSE OF THE SECOND QUARTER, OUR COMPANY, CASTAÑACORP
    HAS ACHIEVED A GROWTH IN THE REVENUE OF 7.47%. THIS IS IN LINE
    WITH THE OBJECTIVES FOR THE YEAR. THE MAIN DRIVER OF THE SALES HAS BEEN
    THE NEW PACKAGE DESIGNED UNDER THE SUPERVISION OF OUR MARKETING DEPARTMENT.
    OUR EXPENSES HAS BEEN CONTAINED, INCREASING ONLY BY 0.7%, THOUGH THE BOARD
    CONSIDERS IT NEEDS TO BE FURTHER REDUCED. THE EVALUATION IS SATISFACTORY
    AND THE FORECAST FOR THE NEXT QUARTER IS OPTIMISTIC. THE BOARD EXPECTS
    AN INCREASE IN PROFIT OF AT LEAST 2 MILLION DOLLARS.

We need to redact the text to eliminate any references to numbers. It needs to be properly formatted by adding a new line after each period, justified with 80 characters, and transformed into ASCII for compatibility reasons.

The text will be stored in the INPUT_TEXT variable in the interpreter.

How to do it...

After entering the text, split it into individual words:

>>> INPUT_TEXT = '''
...     AFTER THE CLOSE OF THE SECOND QUARTER, OUR COMPANY, CASTAÑACORP
...     HAS ACHIEVED A GROWTH IN THE REVENUE OF 7.47%. THIS IS IN LINE
...
'''
>>> words = INPUT_TEXT.split()

Replace any numerical digits with an 'X' character:

>>> redacted = [''.join('X' if w.isdigit() else w for w in word) for word in words]

Transform the text into pure ASCII (note that the name of the company contains a letter, ñ, which is not ASCII):

>>> ascii_text = [word.encode('ascii', errors='replace').decode('ascii')
...               for word in redacted]

Group the words into 80-character lines:

>>> newlines = [word + '\n' if word.endswith('.') else word for word in ascii_text]
>>> LINE_SIZE = 80
>>> lines = []
>>> line = ''
>>> for word in newlines:
...     if line.endswith('\n') or len(line) + len(word) + 1 > LINE_SIZE:
...         lines.append(line)
...         line = ''
...     line = line + ' ' + word

Format all lines as titles and join them as a single piece of text:

>>> lines = [line.title() for line in lines]
>>> result = '\n'.join(lines)

Print the result:

>>> print(result)
 After The Close Of The Second Quarter, Our Company, Casta?Acorp Has Achieved A
 Growth In The Revenue Of X.Xx%.

 This Is In Line With The Objectives For The Year.

 The Main Driver Of The Sales Has Been The New Package Designed Under The
 Supervision Of Our Marketing Department.

 Our Expenses Has Been Contained, Increasing Only By X.X%, Though The Board
 Considers It Needs To Be Further Reduced.

 The Evaluation Is Satisfactory And The Forecast For The Next Quarter Is
 Optimistic.

How it works...

Each of the steps performs a specific transformation of the text:

The first one splits the text on the default separators, whitespaces, and new lines. This splits it into individual words with no lines or multiple spaces for separation.
To replace the digits, we go through every character of each word. For each one, if it's a digit, an 'X' is returned instead. This is done with two list comprehensions, one to run on the list, and another on each word, replacing only if there's a digit—['X' if w.isdigit() else w for w in word]. Note that the words are joined together again.
Each of the words is encoded into an ASCII byte sequence and decoded back again into the Python string type. Note the use of the errors parameter to force the replacement of unknown characters such as ñ.

The difference between strings and bytes is not very intuitive at first, especially if you never have to worry about multiple languages or encoding transformation. In Python 3, there's a strong separation between strings (internal Python representation) and bytes, so most of the tools applicable to strings won't be available in byte objects. Unless you have a good idea of why you need a byte object, always work with Python strings. If you need to perform transformations like the one in this task, encode and decode in the same line so that you keep your objects in the comfortable realm of Python strings. If you are interested in learning more about encodings, you can check out this brief article (https://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3) and this other longer and more detailed one (http://www.diveintopython3.net/strings.html).

This step first adds an extra newline character (the \n character) for all words ending with a period. This marks the different paragraphs. After that, it creates a line and adds the words one by one. If an extra word will make it go over 80 characters, it finishes the line and starts a new one. If the line already ends with a new line, it finishes it and starts another one as well. Note that there's an extra space added to separate the words.
Finally, each of the lines is capitalized as a Title (the first letter of each word is upper cased) and all the lines are joined through new lines.

There's more...

Some other useful operations that can be performed on strings are as follows:

Strings can be sliced like any other list. This means that 'word'[0:2] will return 'wo'.
Use .splitlines() to separate lines by newline character.
There are .upper() and .lower() methods, which return a copy with all the characters set to uppercase or lowercase. Their use is very similar to .title():

>>> 'UPPERCASE'.lower()
'uppercase'

For easy replacements (for example, change all A to B or change mine to ours), use .replace(). This method is useful for very simple cases, but replacements can get tricky easily. Be careful with the order of replacements to avoid collisions and case sensitivity issues. Note the wrong replacement in the following example:

>>> 'One ring to rule them all, one ring to find them, One ring to bring them all and in the darkness bind them.'.replace('ring', 'necklace')
'One necklace to rule them all, one necklace to find them, One necklace to bnecklace them all and in the darkness bind them.'

This is similar to the issues we'll see with regular expressions matching unexpected parts of your code.

There are more examples to follow later. Refer to the regular expressions recipes for more information.

If you work with multiple languages, or with any kind of non-English input, it is very useful to learn the basics of Unicode and encodings. In a nutshell, given the vast amount of characters in all the different languages in the world, including alphabets not related to the Latin one, such as Chinese or Arabic, there's a standard to try and cover all of them so that computers can properly understand them. Python 3 greatly improved this situation, making the strings internal objects to deal with all of those characters. The encoding that Python uses, and the most common and compatible one, is currently UTF-8.

A good article to learn about the basics of UTF-8 is this blog post: (https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/).

Dealing with encodings is still relevant when reading from external files that can be encoded in different encodings (for example, CP-1252 or windows-1252, which is a common encoding produced by legacy Microsoft systems, or ISO 8859-15, which is the industry standard).

Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Related Content you might be interested in

Current Title:

Python Automation Cookbook

Automate it! - Recipes to upskill your business