Modern Python Cookbook

Modern Python Cookbook

Overview of this book

Python is the preferred choice of developers, engineers, data scientists, and hobbyists everywhere. It is a great scripting language that can power your applications and provide great speed, safety, and scalability. By exposing Python as a series of simple recipes, you can gain insight into specific language features in a particular context. Having a tangible context helps make the language or standard library feature easier to understand. This book comes with over 100 recipes on the latest version of Python. The recipes will benefit everyone ranging from beginner to an expert. The book is broken down into 13 chapters that build from simple language concepts to more complex applications of the language. The recipes will touch upon all the necessary Python concepts related to data structures, OOP, functional programming, as well as statistical programming. You will get acquainted with the nuances of Python syntax and how to effectively use the advantages that it offers. You will end the book equipped with the knowledge of testing, web services, and configuration and application integration tips and tricks. The recipes take a problem-solution approach to resolve issues commonly faced by Python programmers across the globe. You will be armed with the knowledge of creating applications with flexible logging, powerful configuration, and command-line options, automated unit tests, and good documentation.

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Numbers, Strings, and Tuples

Introduction

Creating meaningful names and using variables

Working with large and small integers

Choosing between float, decimal, and fraction

Choosing between true division and floor division

Rewriting an immutable string

String parsing with regular expressions

Building complex strings with "template".format()

Building complex strings from lists of characters

Using the Unicode characters that aren't on our keyboards

Encoding strings – creating ASCII and UTF-8 bytes

Decoding bytes – how to get proper characters from some bytes

Using tuples of items

Statements and Syntax

Introduction

Writing Python script and module files – syntax basics

Writing long lines of code

Including descriptions and documentation

Writing better RST markup in docstrings

Designing complex if...elif chains

Designing a while statement which terminates properly

Avoiding a potential problem with break statements

Leveraging the exception matching rules

Avoiding a potential problem with an except: clause

Chaining exceptions with the raise from statement

Managing a context using the with statement

Function Definitions

Introduction

Designing functions with optional parameters

Using super flexible keyword parameters

Forcing keyword-only arguments with the * separator

Writing explicit types on function parameters

Picking an order for parameters based on partial functions

Writing clear documentation strings with RST markup

Designing recursive functions around Python's stack limits

Writing reusable scripts with the script library switch

Built-in Data Structures – list, set, dict

Introduction

Choosing a data structure

Building lists – literals, appending, and comprehensions

Slicing and dicing a list

Deleting from a list – deleting, removing, popping, and filtering

Reversing a copy of a list

Using set methods and operators

Removing items from a set – remove(), pop(), and difference

Creating dictionaries – inserting and updating

Removing from dictionaries – the pop() method and the del statement

Controlling the order of dict keys

Handling dictionaries and sets in doctest examples

Understanding variables, references, and assignment

Making shallow and deep copies of objects

Avoiding mutable default values for function parameters

User Inputs and Outputs

Introduction

Using features of the print() function

Using input() and getpass() for user input

Debugging with "format".format_map(vars())

Using argparse to get command-line input

Using cmd for creating command-line applications

Using the OS environment settings

Basics of Classes and Objects

Introduction

Using a class to encapsulate data and processing

Designing classes with lots of processing

Designing classes with little unique processing

Optimizing small objects with __slots__

Using more sophisticated collections

Extending a collection – a list that does statistics

Using properties for lazy attributes

Using settable properties to update eager attributes

More Advanced Class Design

Introduction

Choosing between inheritance and extension – the is-a question

Separating concerns via multiple inheritance

Leveraging Python's duck typing

Managing global and singleton objects

Using more complex structures – maps of lists

Creating a class that has orderable objects

Defining an ordered collection

Deleting from a list of mappings

Input/Output, Physical Format, and Logical Layout

Introduction

Using pathlib to work with filenames

Reading and writing files with context managers

Replacing a file while preserving the previous version

Reading delimited files with the CSV module

Reading complex formats using regular expressions

Reading JSON documents

Reading XML documents

Reading HTML documents

Upgrading CSV from DictReader to namedtuple reader

Upgrading CSV from a DictReader to a namespace reader

Using multiple contexts for reading and writing files

Testing

Introduction

Using docstrings for testing

Testing functions that raise exceptions

Handling common doctest issues

Creating separate test modules and packages

Combining unittest and doctest tests

Testing things that involve dates or times

Testing things that involve randomness

Mocking external resources

Web Services

Introduction

Implementing web services with WSGI

Using the Flask framework for RESTful APIs

Parsing the query string in a request

Making REST requests with urllib

Parsing the URL path

Parsing a JSON request

Implementing authentication for web services

Application Integration

Introduction

Finding configuration files

Using YAML for configuration files

Using Python for configuration files

Using class-as-namespace for configuration

Designing scripts for composition

Using logging for control and audit output

Combining two applications into one

Combining many applications using the Command design pattern

Managing arguments and configuration in composite applications

Wrapping and combining CLI applications

Wrapping a program and checking the output

Controlling complex sequences of steps

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Decoding bytes – how to get proper characters from some bytes

How can we work with files that aren't properly encoded? What do we do with files written in the ASCII encoding?

A download from the Internet is almost always in bytes—not characters. How do we decode the characters from that stream of bytes?

Also, when we use the subprocess module, the results of an OS command are in bytes. How can we recover proper characters?

Much of this is also relevant to the material in Chapter 8, Input/Output, Physical Format, Logical Layout. We've included the recipe here because it's the inverse of the previous recipe, Encoding strings – creating ASCII and UTF-8 bytes.

Getting ready

Let's say we're interested in offshore marine weather forecasts. Perhaps because we own a large sailboat. Or perhaps because good friends of ours have a large sailboat and are departing the Chesapeake Bay for the Caribbean.

Are there any special warnings coming from the National Weather Services office in Wakefield, Virginia?

Here's where we can get the warnings: http://www.nws.noaa.gov/view/national.php?prod=SMW&sid=AKQ.

We can download this with Python's urllib module:

>>> import urllib.request>>> warnings_uri= 'http://www.nws.noaa.gov/view/national.php?prod=SMW&sid=AKQ'>>> with urllib.request.urlopen(warnings_uri) as source:...     warnings_text= source.read()

Or, we can use programs like curl or wget to get this. We might do:

curl -O http://www.nws.noaa.gov/view/national.php?prod=SMW&sid=AKQmv national.php\?prod\=SMW AKQ.html

Since curl left us with an awkward file name, we needed to rename the file.

The forecast_text value is a stream of bytes. It's not a proper string. We can tell because it starts like this:

>>> warnings_text[:80]b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.or'

And goes on for a while providing details. Because it starts with b', it's bytes, not proper Unicode characters. It was probably encoded with UTF-8, which means some characters could have weird-looking \xnn escape sequences instead of proper characters. We want to have the proper characters.

Note

Bytes vs Strings Bytes are often displayed using printable characters. We'll see b'hello' as a short-hand for a five-byte value. The letters are chosen using the old ASCII encoding scheme. Many byte values from about 0x20 to 0xFE will be shown as characters. This can be confusing. The prefix of b' is our hint that we're looking at bytes, not proper Unicode characters.

Generally, bytes behave somewhat like strings. Sometimes we can work with bytes directly. Most of the time, we'll want to decode the bytes and create proper Unicode characters.

How to do it..

.Determine the coding scheme if possible. In order to decode bytes to create proper Unicode characters, we need to know what encoding scheme was used. When we read XML documents, there's a big hint provided within the document:

<?xml version="1.0" encoding="UTF-8"?>

When browsing web pages, there's often a header with this information:

Content-Type: text/html; charset=ISO-8859-4

Sometimes an HTML page may include this as part of the header:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

In other cases, we're left to guess. In the case of US Weather data, a good first guess is UTF-8. Other good guesses include ISO-8859-1. In some cases, the guess will depend on the language.

Section 7.2.3, Python Standard Library lists the standard encodings available. Decode the data:

>>> document = forecast_text.decode("UTF-8")>>> document[:80]'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.or'

The b' prefix is gone. We've created a proper string of Unicode characters from the stream of bytes.

If this step fails with an exception, we guessed wrong about the encoding. We need to try another encoding. Parse the resulting document.

Since this is an HTML document, we should use Beautiful Soup. See http://www.crummy.com/software/BeautifulSoup/.

We can, however, extract one nugget of information from this document without completely parsing the HTML:

>>> import re>>> title_pattern = re.compile(r"\<h3\>(.*?)\</h3\>")>>> title_pattern.search( document )<_sre.SRE_Match object; span=(3438, 3489), match='<h3>There are no products active at this time.</h>

This tells us what we need to know: there are no warnings at this time. That doesn't mean smooth sailing, but it does mean that there aren't any major weather systems that can cause catastrophes.

How it works...

See the Encoding strings – creating ASCII and UTF-8 bytes recipe for more information on Unicode and the different ways that Unicode characters can be encoded into streams of bytes.

At the foundation of the operating system, files and network connections are built up from bytes. It's our software that decodes the bytes to discover the content. It might be characters, or images, or sounds. In some cases, the default assumptions are wrong and we need to do our own decoding.

Modern Python Cookbook

Modern Python Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

Modern Python Cookbook

Decoding bytes – how to get proper characters from some bytes

Getting ready

Note

How to do it..

How it works...

See also