Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Have you been doing the same old monotonous office work over and over again? Or have you been trying to find an easy way to make your life better by automating some of your repetitive tasks? Through a tried and tested approach, understand how to automate all the boring stuff using Python. The Python Automation Cookbook helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. You’ll learn how to get notifications via text messages and run tasks while your mind is focused on other important activities, followed by understanding how to scan documents such as résumés. Once you’ve gotten familiar with the fundamentals, you’ll be introduced to the world of graphs, along with studying how to produce organized charts using Matplotlib. In addition to this, you’ll gain in-depth knowledge of how to generate rich graphics showing relevant information. By the end of this book, you’ll have refined your skills by attaining a sound understanding of how to identify and correct problems to produce superior and reliable systems.

Preface

Who this book is for

What this book covers

To get the most out of this book

Sections

Get in touch

Free Chapter

Let Us Begin Our Automation Journey

Introduction

Creating a virtual environment

Installing third-party packages

Creating strings with formatted values

Manipulating strings

Extracting data from structured strings

Using a third-party tool—parse

Introducing regular expressions

Going deeper into regular expressions

Adding command-line arguments

Automating Tasks Made Easy

Introduction

Preparing a task

Setting up a cron job

Capturing errors and problems

Sending email notifications

Building Your First Web Scraping Application

Introduction

Downloading web pages

Interacting with forms

Using Selenium for advanced interaction

Accessing password-protected pages

Speeding up web scraping

Searching and Reading Local Files

Introduction

Crawling and searching directories

Reading text files

Dealing with encodings

Reading CSV files

Reading log files

Reading file metadata

Reading images

Reading PDF files

Reading Word documents

Scanning documents for a keyword

Generating Fantastic Reports

Introduction

Creating a simple report in plain text

Using templates for reports

Formatting text in Markdown

Writing a basic Word document

Styling a Word document

Generating structure in Word documents

Adding pictures to Word documents

Writing a simple PDF document

Structuring a PDF

Aggregating PDF reports

Watermarking and encrypting a PDF

Fun with Spreadsheets

Introduction

Writing a CSV spreadsheet

Updating the CSV files

Reading an Excel spreadsheet

Updating an Excel spreadsheet

Creating new sheets on an Excel spreadsheet

Creating charts in Excel

Working with format in Excel

Creating a macro in LibreOffice

Developing Stunning Graphs

Introduction

Plotting a simple sales graph

Drawing stacked bars

Plotting pie charts

Displaying multiple lines

Drawing a scatter plot

Visualizing maps

Adding legends and annotations

Combining graphs

Saving charts

Dealing with Communication Channels

Introduction

Working with email templates

Sending an individual email

Reading an email

Adding subscribers to an email newsletter

Sending notifications via email

Producing SMS

Receiving SMS

Creating a Telegram bot

Why Not Automate Your Marketing Campaign?

Introduction

Detecting the opportunities

Creating personalized coupon codes

Sending a notification to the customer on their preferred channel

Preparing sales information

Generating a sales report

Debugging Techniques

Introduction

Learning Python interpreter basics

Debugging through logging

Debugging with breakpoints

Improving your debugging skills

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Extracting data from structured strings

In a lot of automated tasks, we'll need to treat input text that's in a particular format and extract the relevant information. For example, a spreadsheet may define a percentage in text (such as 37.4%) that we want to retrieve in numerical format to apply it later (0.374, as a float).

In this recipe, we'll see how to process sale logs that contain inline information about a product, such as sold, price, profit, and some other information.

Getting ready

Imagine that we need to parse information stored in sales logs. We'll use a sales log with the following structure:

[<Timestamp in iso format>] - SALE - PRODUCT: <product id> - PRICE: $<price of the sale>

For example, a specific log may look like this:

[2018-05-05T10:58:41.504054] - SALE - PRODUCT: 1345 - PRICE: $09.99

Note that the price has a leading zero. All prices will have two digits for the dollars, and two for the cents.

We need to activate our virtual environment before we start:

$ source .venv/bin/activate

How to do it...

In the Python interpreter, make the following imports. Remember to activate your virtualenv, as described in the Creating a virtual environment recipe:

>>> import delorean
>>> from decimal import Decimal

Enter the log to parse:

>>> log = '[2018-05-05T11:07:12.267897] - SALE - PRODUCT: 1345 - PRICE: $09.99'

Split the log into its parts, which are divided by - (note the space before and after the dash). We ignore the SALE part as it doesn't add any relevant information:

>>> divide_it = log.split(' - ')
>>> timestamp_string, _, product_string, price_string = divide_it

Parse the timestamp into a datetime object:

>>> timestamp = delorean.parse(tmp_string.strip('[]'))

Parse the product_id into a integer:

>>> product_id = int(product_string.split(':')[-1])

Parse the price into a Decimal type:

>>> price = Decimal(price_string.split('$')[-1])

Now, you have all the values in native Python formats:

>> timestamp, product_id, price
(Delorean(datetime=datetime.datetime(2018, 5, 5, 11, 7, 12, 267897), timezone='UTC'), 1345, Decimal('9.99'))

How it works...

The basic working of this is to isolate each of the elements and then parse them in to the proper type. The first step is to split the full log into smaller parts. The - string is a good divider, as it splits it into four parts—a timestamp one, one with just the word SALE, the product, and the price.

In the case of the timestamp, we need to isolate the ISO format, which is in brackets in the log. That's why it's stripped off the brackets. We use the delorean module (introduced earlier) to parse it in to a datetime object.

The word SALE is ignored. There's no relevant information there.

To isolate the product ID, we split the product part at the colon. Then, we parse the last element as an integer:

>>> product_string.split(':')
['PRODUCT', ' 1345']
>>> int(' 1345')
1345

To divide the price, we use the dollar sign as a separator, and parse it as a Decimal character:

>>> price_string.split('$')
['PRICE: ', '09.99']
>>> Decimal('09.99')
Decimal('9.99')

As described in the next section, do not parse this value into a float type.

There's more...

These log elements can be combined together into a single object, helping with parsing and aggregating them. For example, we could define a class in Python code in the following way:

class PriceLog(object):
  def __init__(self, timestamp, product_id, price):
    self.timestamp = timestamp
    self.product_id = product_id
    self.price = price
  def __repr__(self):
    return '<PriceLog ({}, {}, {})>'.format(self.timestamp,
                                            self.product_id,
                                            self.price)
  @classmethod
  def parse(cls, text_log):
    '''
    Parse from a text log with the format
    [<Timestamp>] - SALE - PRODUCT: <product id> - PRICE: $<price>
    to a PriceLog object
    '''
    divide_it = text_log.split(' - ')
    tmp_string, _, product_string, price_string = divide_it
    timestamp = delorean.parse(tmp_string.strip('[]'))
    product_id = int(product_string.split(':')[-1])
    price = Decimal(price_string.split('$')[-1])
    return cls(timestamp=timestamp, product_id=product_id, price=price)

So, the parsing can be done as follows:

>>> log = '[2018-05-05T12:58:59.998903] - SALE - PRODUCT: 897 - PRICE: $17.99'
>>> PriceLog.parse(log)
<PriceLog (Delorean(datetime=datetime.datetime(2018, 5, 5, 12, 58, 59, 998903), timezone='UTC'), 897, 17.99)>

Avoid using float types for prices. Floats numbers have precision problems that may produce strange errors when aggregating multiple prices, for example:

>>> 0.1 + 0.1 + 0.1 
0.30000000000000004

Try these two options to avoid problems:

Use integer cents as the base unit: This means multiplying currency inputs by 100 and transforming them into integers (or whatever fractional unit is correct for the currency used). You may still want to change the base when displaying them.
Parse into the Decimal type: The Decimal type keeps the fixed precision and works as you'd expect. You can find further information about the Decimal type in the Python docs at https://docs.python.org/3.6/library/decimal.html.

If you use the Decimal type, parse the results directly into Decimal from the string. If transforming it first into a float, you can carry the precision errors to the new type.

Python Automation Cookbook

By : Jaime Buelta

Python Automation Cookbook

By: Jaime Buelta

Overview of this book

Related Content you might be interested in

Current Title:

Python Automation Cookbook

Automate it! - Recipes to upskill your business