Book Image

Modern Python Standard Library Cookbook

By : Alessandro Molina
Book Image

Modern Python Standard Library Cookbook

By: Alessandro Molina

Overview of this book

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python. The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data. By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.
Table of Contents (21 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Prioritizing entries


Picking the first/top entry of a set of values is a pretty frequent need; this usually means defining one value that has priority over the other and involves sorting.

But sorting can be expensive and re-sorting every time you add an entry to your values is certainly not a very convenient way to pick the first entry out of a set of values with some kind of priority.

How to do it...

Heaps are a perfect match for everything that has priorities, such as a priority queue:

import time
import heapq

class PriorityQueue:
    def __init__(self):
        self._q = []

    def add(self, value, priority=0):
        heapq.heappush(self._q, (priority, time.time(), value))

    def pop(self):
        return heapq.heappop(self._q)[-1]

 

 

Then, our PriorityQueue can be used to retrieve entries given a priority:

>>> def f1(): print('hello')
>>> def f2(): print('world')
>>>
>>> pq = PriorityQueue()
>>> pq.add(f2, priority=1)
>>> pq.add(f1, priority=0)
>>> pq.pop()()
hello
>>> pq.pop()()
world

 

How it works...

PriorityQueue works by storing everything in an heap. Heaps are particularly efficient at retrieving the top/first element of a sorted set without having to actually sort the whole set.

Our priority queue stores all the values in a three-element tuple: priority, time.time(), and value.

The first entry of our tuple is priority (lower is better). In the example, we recorded f1 with a better priority than f2, which ensures than when we use heap.heappop to fetch tasks to process, we get f1 and then f2, so that we end up with the hello world message and not world hello.

The second entry, timestamp, is used to ensure that tasks that have the same priority are processed in their insertion order. The oldest task will be served first as it will have the smallest timestamp.

Then, we have the value itself, which is the function we want call for our task.

There's more...

A very common approach to sorting is to keep a list of entries in a tuple, where the first element is key for which we are sorting and the second element is the value itself.

For a scoreboard, we can keep each player's name and how many points they got:

scores = [(123, 'Alessandro'),
          (143, 'Chris'),
          (192, 'Mark']

 

Storing those values in tuples works because comparing two tuples is performed by comparing each element of the first tuple with the element in the same index position in the other tuple:

>>> (10, 'B') > (10, 'A')
True
>>> (11, 'A') > (10, 'B')
True

It's very easy to understand what's going on if you think about strings. 'BB' > 'BB' is the same as ('B', 'B') > ('B', 'A'); in the end, a string is just a list of characters.

We can use this property to sort our scores and retrieve the winner of a competition:

>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

The major problem with this approach is that every time we add an entry to our list, we have to sort it again, or our scoreboard would became meaningless:

>>> scores.append((137, 'Rick'))
>>> scores[-1]
(137, 'Rick')
>>> scores = sorted(scores)
>>> scores[-1]
(192, 'Mark')

This is very inconvenient because it's easy to miss re-sorting somewhere if we have multiple places appending to the list, and sorting the whole list every time can be expensive.

The Python standard library offers a data structure that is a perfect match when we're interested in finding out the winner of a competition.

In the heapq module, we have a fully working implementation of a heap data structure, a particular kind of tree where each parent is smaller than its children. This provides us with a tree that has a very interesting property: the root element is always the smallest one.

And being implemented on top of a list, it means that l[0] is always the smallest element in a heap:

>>> import heapq
>>> l = []
>>> heapq.heappush(l, (192, 'Mark'))
>>> heapq.heappush(l, (123, 'Alessandro'))
>>> heapq.heappush(l, (137, 'Rick'))
>>> heapq.heappush(l, (143, 'Chris'))
>>> l[0]
(123, 'Alessandro')

You might have noticed, by the way, that the heap finds the loser of our tournament, not the winner, and we were interested in finding the best player, with the highest value.

This is a minor problem we can easily solve by storing all scores as negative numbers. If we store each score as * -1, the head of the heap will always be the winner:

>>> l = []
>>> heapq.heappush(l, (-143, 'Chris'))
>>> heapq.heappush(l, (-137, 'Rick'))
>>> heapq.heappush(l, (-123, 'Alessandro'))
>>> heapq.heappush(l, (-192, 'Mark'))
>>> l[0]
(-192, 'Mark')