Modern Python Cookbook

Modern Python Cookbook

Overview of this book

Python is the preferred choice of developers, engineers, data scientists, and hobbyists everywhere. It is a great scripting language that can power your applications and provide great speed, safety, and scalability. By exposing Python as a series of simple recipes, you can gain insight into specific language features in a particular context. Having a tangible context helps make the language or standard library feature easier to understand. This book comes with over 100 recipes on the latest version of Python. The recipes will benefit everyone ranging from beginner to an expert. The book is broken down into 13 chapters that build from simple language concepts to more complex applications of the language. The recipes will touch upon all the necessary Python concepts related to data structures, OOP, functional programming, as well as statistical programming. You will get acquainted with the nuances of Python syntax and how to effectively use the advantages that it offers. You will end the book equipped with the knowledge of testing, web services, and configuration and application integration tips and tricks. The recipes take a problem-solution approach to resolve issues commonly faced by Python programmers across the globe. You will be armed with the knowledge of creating applications with flexible logging, powerful configuration, and command-line options, automated unit tests, and good documentation.

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Numbers, Strings, and Tuples

Introduction

Creating meaningful names and using variables

Working with large and small integers

Choosing between float, decimal, and fraction

Choosing between true division and floor division

Rewriting an immutable string

String parsing with regular expressions

Building complex strings with "template".format()

Building complex strings from lists of characters

Using the Unicode characters that aren't on our keyboards

Encoding strings – creating ASCII and UTF-8 bytes

Decoding bytes – how to get proper characters from some bytes

Using tuples of items

Statements and Syntax

Introduction

Writing Python script and module files – syntax basics

Writing long lines of code

Including descriptions and documentation

Writing better RST markup in docstrings

Designing complex if...elif chains

Designing a while statement which terminates properly

Avoiding a potential problem with break statements

Leveraging the exception matching rules

Avoiding a potential problem with an except: clause

Chaining exceptions with the raise from statement

Managing a context using the with statement

Function Definitions

Introduction

Designing functions with optional parameters

Using super flexible keyword parameters

Forcing keyword-only arguments with the * separator

Writing explicit types on function parameters

Picking an order for parameters based on partial functions

Writing clear documentation strings with RST markup

Designing recursive functions around Python's stack limits

Writing reusable scripts with the script library switch

Built-in Data Structures – list, set, dict

Introduction

Choosing a data structure

Building lists – literals, appending, and comprehensions

Slicing and dicing a list

Deleting from a list – deleting, removing, popping, and filtering

Reversing a copy of a list

Using set methods and operators

Removing items from a set – remove(), pop(), and difference

Creating dictionaries – inserting and updating

Removing from dictionaries – the pop() method and the del statement

Controlling the order of dict keys

Handling dictionaries and sets in doctest examples

Understanding variables, references, and assignment

Making shallow and deep copies of objects

Avoiding mutable default values for function parameters

User Inputs and Outputs

Introduction

Using features of the print() function

Using input() and getpass() for user input

Debugging with "format".format_map(vars())

Using argparse to get command-line input

Using cmd for creating command-line applications

Using the OS environment settings

Basics of Classes and Objects

Introduction

Using a class to encapsulate data and processing

Designing classes with lots of processing

Designing classes with little unique processing

Optimizing small objects with __slots__

Using more sophisticated collections

Extending a collection – a list that does statistics

Using properties for lazy attributes

Using settable properties to update eager attributes

More Advanced Class Design

Introduction

Choosing between inheritance and extension – the is-a question

Separating concerns via multiple inheritance

Leveraging Python's duck typing

Managing global and singleton objects

Using more complex structures – maps of lists

Creating a class that has orderable objects

Defining an ordered collection

Deleting from a list of mappings

Input/Output, Physical Format, and Logical Layout

Introduction

Using pathlib to work with filenames

Reading and writing files with context managers

Replacing a file while preserving the previous version

Reading delimited files with the CSV module

Reading complex formats using regular expressions

Reading JSON documents

Reading XML documents

Reading HTML documents

Upgrading CSV from DictReader to namedtuple reader

Upgrading CSV from a DictReader to a namespace reader

Using multiple contexts for reading and writing files

Testing

Introduction

Using docstrings for testing

Testing functions that raise exceptions

Handling common doctest issues

Creating separate test modules and packages

Combining unittest and doctest tests

Testing things that involve dates or times

Testing things that involve randomness

Mocking external resources

Web Services

Introduction

Implementing web services with WSGI

Using the Flask framework for RESTful APIs

Parsing the query string in a request

Making REST requests with urllib

Parsing the URL path

Parsing a JSON request

Implementing authentication for web services

Application Integration

Introduction

Finding configuration files

Using YAML for configuration files

Using Python for configuration files

Using class-as-namespace for configuration

Designing scripts for composition

Using logging for control and audit output

Combining two applications into one

Combining many applications using the Command design pattern

Managing arguments and configuration in composite applications

Wrapping and combining CLI applications

Wrapping a program and checking the output

Controlling complex sequences of steps

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Encoding strings – creating ASCII and UTF-8 bytes

Our computer files are bytes. When we upload or download from the Internet, the communication works in bytes. A byte only has 256 distinct values. Our Python characters are Unicode. There are a lot more than 256 Unicode characters.

How do we map Unicode characters to bytes for writing to a file or transmitting?

Getting ready

Historically, a character occupied 1 byte. Python leverages the old ASCII encoding scheme for bytes; this sometimes leads to confusion between bytes and proper strings of Unicode characters.

Unicode characters are encoded into sequences of bytes. We have a number of standardized encodings and a number of non-standard encodings.

Plus, we also have some encodings that only work for a small subset of Unicode characters. We try to avoid this, but there are some situations where we'll need to use a subset encoding scheme.

Unless we have a really good reason, we almost always use the UTF-8 encoding for Unicode characters. Its main advantage is that it's a compact representation for the Latin alphabet used for English and a number of European languages.

Sometimes, an Internet protocol requires ASCII characters. This is a special case that requires some care because the ASCII encoding can only handle a small subset of Unicode characters.

How to do it...

Python will generally use our OS's default encoding for files and Internet traffic. The details are unique to each OS:

We can make a general setting using the PYTHONIOENCODING environment variable. We set this outside of Python to assure that a particular encoding is used everywhere. Set the environment variable as:

export PYTHONIOENCODING=UTF-8

Run Python:

python3.5

We sometimes need to make specific settings when we open a file inside our script. We'll return this in Chapter 8, Input/Output, Physical Format, Logical Layout. Open the file with a given encoding. Read or write Unicode characters to the file:

>>> with open('some_file.txt', 'w', encoding='utf-8') as output:...     print( 'You drew \U0001F000', file=output )>>> with open('some_file.txt', 'r', encoding='utf-8') as input:...     text = input.read()>>> text'You drew �'

We can also manually encode characters, in the rare case that we need to open a file in bytes mode; if we use a mode of wb, we'll need to use manual encoding:

>>> string_bytes = 'You drew \U0001F000'.encode('utf-8')>>> string_bytesb'You drew \xf0\x9f\x80\x80'

We can see that a sequence of bytes (\xf0\x9f\x80\x80) was used to encode a single Unicode character, U+1F000,

How it works...

Unicode defines a number of encoding schemes. While UTF-8 is the most popular, there are also UTF-16 and UTF-32. The number is the typical number of bits per character. A file with 1000 characters encoded in UTF-32 would be 4000 8-bit bytes. A file with 1000 characters encoded in UTF-8 could be as few as 1000 bytes, depending on the exact mix of characters. In the UTF-8 encoding, characters with Unicode numbers above U+007F require multiple bytes.

Various OS's have their own coding schemes. Mac OS X files are often encoded in Mac Roman or Latin-1. Windows files might use CP1252 encoding.

The point with all of these schemes is to have a sequence of bytes that can be mapped to a Unicode character. And—going the other way—a way to map each Unicode character to one or more bytes. Ideally, all of the Unicode characters are accounted for. Pragmatically, some of these coding schemes are incomplete. The tricky part is to avoid writing any more bytes than is necessary.

The historical ASCII encoding can only represent about 250 of the Unicode characters as bytes. It's easy to create a string which cannot be encoded using the ASCII scheme.

Here's what the error looks like:

>>> 'You drew \U0001F000'.encode('ascii')Traceback (most recent call last):File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f000' in position 9: ordinal not in range(128)

We may see this kind of error when we accidentally open a file with a poorly chosen encoding. When we see this, we'll need to change our processing to select a more useful encoding; ideally, UTF-8.

Note

Bytes vs Strings Bytes are often displayed using printable characters. We'll see b'hello' as a short-hand for a five-byte value. The letters are chosen using the old ASCII encoding scheme. Many byte values from about 0x20 to 0xFE will be shown as characters. This can be confusing. The prefix of b' is our hint that we're looking at bytes, not proper Unicode characters.

Modern Python Cookbook

Modern Python Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

Modern Python Cookbook

Encoding strings – creating ASCII and UTF-8 bytes

Getting ready

How to do it...

How it works...

Note

See also