Book Image

Modern Python Standard Library Cookbook

By : Alessandro Molina
Book Image

Modern Python Standard Library Cookbook

By: Alessandro Molina

Overview of this book

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python. The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data. By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.
Table of Contents (21 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Bunch


Python is very good at shapeshifting objects. Each instance can have its own attributes and it's absolutely legal to add/remove the attributes of an object at runtime.

Once in a while, our code needs to deal with data of unknown shapes. For example, in the case of a user-submitted data, we might not know which fields the user is providing; maybe some of our users have a first name, some have a surname, and some have one or more middle name fields.

If we are not processing this data ourselves, but are just providing it to some other function, we really don't care about the shape of the data; as long as our objects have those attributes, we are fine.

A very common case is when working with protocols, if you are an HTTP server, you might want to provide to the application running behind you a request object. This object has a few known attributes, such as host and path, and it might have some optional attributes, such as a query string or a content type. But, it can also have any attribute the client provided, as HTTP is pretty flexible regarding headers, and our clients could have provided an x-totally-custom-header that we might have to expose to our code.

When representing this kind of data, Python developers often tend to look at dictionaries. In the end, Python objects themselves are built on top of dictionaries and they fit the need to map arbitrary values to names.

So, we will probably end up with something like the following:

>>> request = dict(host='www.example.org', path='/index.html')

A side effect of this approach is pretty clear once we have to pass this object around, especially to third-party code. Functions usually work with objects, and while they don't require a specific kind of object as duck-typing is the standard in Python, they will expect certain attributes to be there.

Another very common example is when writing tests, Python being a duck-typed language, it's absolutely reasonable to want to provide a fake object instead of providing a real instance of the object, especially when we need to simulate the values of some properties (as declared with @property), so we don't want or can't afford to create real instances of the object.

In such cases, using a dictionary is not viable as it will only provide access to its values through the request['path'] syntax and not through request.path, as probably expected by the functions we are providing our object to.

Also, the more we end up accessing this value, the more it's clear that the syntax using dot notation conveys the feeling of an entity that collaborates to the intent of the code, while a dictionary conveys the feeling of plain data.

As soon as we remember that Python objects can change shape at any time, we might be tempted to try creating an object instead of a dictionary. Unfortunately, we won't be able to provide the attributes at initialization time:

>>> request = object(host='www.example.org', path='/index.html')
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: object() takes no parameters

Things don't improve much if we try to assign those attributes after the object is built:

>>> request = object()
>>> request.host = 'www.example.org'
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'host'

 

 

How to do it...

With a little effort, we can create a class that leverages dictionaries to contain any attribute we want and allow access both as a dictionary and through properties:

>>> class Bunch(dict):
...    def __getattribute__(self, key):
...        try: 
...            return self[key]
...        except KeyError:
...            raise AttributeError(key)
...    
...    def __setattr__(self, key, value): 
...        self[key] = value
...
>>> b = Bunch(a=5)
>>> b.a
5
>>> b['a']
5

How it works...

The Bunch class inherits dict, mostly as a way to provide a context where values can be stored, then most of the work is done by __getattribute__ and __setattr__. So, for any attribute that is retrieved or set on the object, they will just retrieve or set a key in self (remember we inherited from dict, so self is in fact a dictionary).

This allows the Bunch class to store and retrieve any value as an attribute of the object. The convenient feature is that it can behave both as an object and as a dict in most contexts.

For example, it is possible to find out all the values that it contains, like any other dictionary:

>>> b.items()
dict_items([('a', 5)])

It is also able to access those as attributes:

>>> b.c = 7
>>> b.c
7
>>> b.items()
dict_items([('a', 5), ('c', 7)])

 

 

There's more...

Our bunch implementation is not yet complete, as it will fail any test for class name (it's always named Bunch) and any test for inheritance, thus failing at faking other objects.

The first step is to make Bunch able to shapeshift not only its properties, but also its name. This can be achieved by creating a new class dynamically every time we create Bunch. The class will inherit from Bunch and will do nothing apart from providing a new name:

>>> class BunchBase(dict):
...    def __getattribute__(self, key):
...        try: 
...            return self[key]
...        except KeyError:
...            raise AttributeError(key)
...    
...    def __setattr__(self, key, value): 
...        self[key] = value
...
>>> def Bunch(_classname="Bunch", **attrs):
...     return type(_classname, (BunchBase, ), {})(**attrs)
>>>

The Bunch function moved from being the class itself to being a factory that will create objects that all act as Bunch, but can have different classes. Each Bunch will be a subclass of BunchBase, where the _classname name can be provided when Bunch is created:

>>> b = Bunch("Request", path="/index.html", host="www.example.org")
>>> print(b)
{'path': '/index.html', 'host': 'www.example.org'}
>>> print(b.path)
/index.html
>>> print(b.host)
www.example.org

This will allow us to create as many kinds of Bunch objects as we want, and each will have its own custom type:

>>> print(b.__class__)
<class '__main__.Request'>

The next step is to make our Bunch actually look like any other type that it has to impersonate. That is needed for the case where we want to use Bunch in place of another object. As Bunch can have any kind of attribute, it can take the place of any kind of object, but to be able to, it has to pass type checks for custom types.

 

 

We need to go back to our Bunch factory and make the Bunch objects not only have a custom class name, but also appear to be inherited from a custom parent.

To better understand what's going on, we will declare an example Person type; this type will be the one our Bunch objects will try to fake:

class Person(object):
    def __init__(name, surname):
        self.name = name
        self.surname = surname

    @property
    def fullname(self):
        return '{} {}'.format(self.name, self.surname)

Specifically, we are going to print Hello Your Name through a custom print function that only works for Person:

def hello(p):
    if not isinstance(p, Person):
        raise ValueError("Sorry, can only greet people")
    print("Hello {}".format(p.fullname))

We want to change our Bunch factory to accept the class and create a new type out of it:

def Bunch(_classname="Bunch", _parent=None, **attrs):
    parents = (_parent, ) if parent else tuple()
    return type(_classname, (BunchBase, ) + parents, {})(**attrs)

Now, our Bunch objects will appear as instances of a class named what we wanted, and will always appear as a subclass of _parent:

>>> p = Bunch("Person", Person, fullname='Alessandro Molina')
>>> hello(p)
Hello Alessandro Molina

Bunch can be a very convenient pattern; in both its complete and simplified versions, it is widely used in many frameworks with various implementations that all achieve pretty much the same result.

The showcased implementation is interesting because it gives us a clear idea of what's going on. There are ways to implement Bunch that are very smart, but might make it hard to guess what's going on and customize it.

Another possible way to implement the Bunch pattern is by patching the __dict__ class, which contains all the attributes of the class:

class Bunch(dict):
    def __init__(self, **kwds):
        super().__init__(**kwds)
        self.__dict__ = self

In this form, whenever Bunch is created, it will populate its values as a dict (by calling super().__init__, which is the dict initialization) and then, once all the attributes provided are stored in dict, it swaps the __dict__ object, which is the dictionary that contains all object attributes, with self. This makes the dict that was just populated with all the values also the dict that contains all the attributes of the object.

Our previous implementation worked by replacing the way we looked for attributes, while this implementation replaces the place where we look for attributes.