Python: Master the Art of Design Patterns

Overview of this book

Python is an object-oriented scripting language that is used in everything from data science to web development. Known for its simplicity, Python increases productivity and minimizes development time. Through applying essential software engineering design patterns to Python, Python code becomes even more efficient and reusable from project to project. This learning path takes you through every traditional and advanced design pattern best applied to Python code, building your skills in writing exceptional Python. Divided into three distinct modules, you’ll go from foundational to advanced concepts by following a series of practical tutorials. Start with the bedrock of Python programming – the object-oriented paradigm. Rethink the way you work with Python as you work through the Python data structures and object-oriented techniques essential to modern Python programming. Build your confidence as you learn Python syntax, and how to use OOP principles with Python tools such as Django and Kivy. In the second module, run through the most common and most useful design patterns from a Python perspective. Progress through Singleton patterns, Factory patterns, Façade patterns and more all with detailed hands-on guidance. Enhance your professional abilities in in software architecture, design, and development. In the final module, run through the more complex and less common design patterns, discovering how to apply them to Python coding with the help of real-world examples. Get to grips with the best practices of writing Python, as well as creating systems architecture and troubleshooting issues. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: ? Python 3 Object-Oriented Programming - Second Edition by Dusty Phillips ? Learning Python Design Patterns - Second Edition by Chetan Giridhar ? Mastering Python Design Patterns by Sakis Kasampalis

Preface

What this learning path covers

What you need for this learning path

Who this learning path is for

Reader feedback

Customer support

Free Chapter

1. Module 1

1. Object-oriented Design

2. Objects in Python

3. When Objects Are Alike

4. Expecting the Unexpected

5. When to Use Object-oriented Programming

6. Python Data Structures

7. Python Object-oriented Shortcuts

8. Strings and Serialization

9. The Iterator Pattern

10. Python Design Patterns I

11. Python Design Patterns II

12. Testing Object-oriented Programs

13. Concurrency

2. Module 2

1. Introduction to Design Patterns

2. The Singleton Design Pattern

3. The Factory Pattern – Building Factories to Create Objects

4. The Façade Pattern – Being Adaptive with Façade

5. The Proxy Pattern – Controlling Object Access

6. The Observer Pattern – Keeping Objects in the Know

7. The Command Pattern – Encapsulating Invocation

8. The Template Method Pattern – Encapsulating Algorithm

9. Model-View-Controller – Compound Patterns

10. The State Design Pattern

11. AntiPatterns

3. Module 3

1. The Factory Pattern

2. The Builder Pattern

3. The Prototype Pattern

4. The Adapter Pattern

5. The Decorator Pattern

6. The Facade Pattern

7. The Flyweight Pattern

8. The Model-View-Controller Pattern

9. The Proxy Pattern

10. The Chain of Responsibility Pattern

11. The Command Pattern

12. The Interpreter Pattern

13. The Observer Pattern

14. The State Pattern

15. The Strategy Pattern

16. The Template Pattern

A. Bibliography

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 7. Python Object-oriented Shortcuts

There are many aspects of Python that appear more reminiscent of structural or functional programming than object-oriented programming. Although object-oriented programming has been the most visible paradigm of the past two decades, the old models have seen a recent resurgence. As with Python's data structures, most of these tools are syntactic sugar over an underlying object-oriented implementation; we can think of them as a further abstraction layer built on top of the (already abstracted) object-oriented paradigm. In this chapter, we'll be covering a grab bag of Python features that are not strictly object-oriented:

Built-in functions that take care of common tasks in one call
File I/O and context managers
An alternative to method overloading
Functions as objects

Python built-in functions

There are numerous functions in Python that perform a task or calculate a result on certain types of objects without being methods on the underlying class. They usually abstract common calculations that apply to multiple types of classes. This is duck typing at its best; these functions accept objects that have certain attributes or methods, and are able to perform generic operations using those methods. Many, but not all, of these are special double underscore methods. We've used many of the built-in functions already, but let's quickly go through the important ones and pick up a few neat tricks along the way.

The len() function

The simplest example is the len() function, which counts the number of items in some kind of container object, such as a dictionary or list. You've seen it before:

>>> len([1,2,3,4])
4

Why don't these objects have a length property instead of having to call a function on them? Technically, they do. Most objects that len() will apply to have a method called __len__() that returns the same value. So len(myobj) seems to call myobj.__len__().

Why should we use the len() function instead of the __len__ method? Obviously __len__ is a special double-underscore method, suggesting that we shouldn't call it directly. There must be an explanation for this. The Python developers don't make such design decisions lightly.

The main reason is efficiency. When we call __len__ on an object, the object has to look the method up in its namespace, and, if the special __getattribute__ method (which is called every time an attribute or method on an object is accessed) is defined on that object, it has to be called as well. Further, __getattribute__ for that particular method may have been written to do something nasty, like refusing to give us access to special methods such as __len__! The len() function doesn't encounter any of this. It actually calls the __len__ function on the underlying class, so len(myobj) maps to MyObj.__len__(myobj).

Another reason is maintainability. In the future, the Python developers may want to change len() so that it can calculate the length of objects that don't have __len__, for example, by counting the number of items returned in an iterator. They'll only have to change one function instead of countless __len__ methods across the board.

There is one other extremely important and often overlooked reason for len() being an external function: backwards compatibility. This is often cited in articles as "for historical reasons", which is a mildly dismissive phrase that an author will use to say something is the way it is because a mistake was made long ago and we're stuck with it. Strictly speaking, len() isn't a mistake, it's a design decision, but that decision was made in a less object-oriented time. It has stood the test of time and has some benefits, so do get used to it.

Reversed

The reversed() function takes any sequence as input, and returns a copy of that sequence in reverse order. It is normally used in for loops when we want to loop over items from back to front.

Similar to len, reversed calls the __reversed__() function on the class for the parameter. If that method does not exist, reversed builds the reversed sequence itself using calls to __len__ and __getitem__, which are used to define a sequence. We only need to override __reversed__ if we want to somehow customize or optimize the process:

normal_list=[1,2,3,4,5]

class CustomSequence():
    def __len__(self):
        return 5


    def __getitem__(self, index):
        return "x{0}".format(index)

class FunkyBackwards():

    def __reversed__(self):
        return "BACKWARDS!"

for seq in normal_list, CustomSequence(), FunkyBackwards():
    print("\n{}: ".format(seq.__class__.__name__), end="")
    for item in reversed(seq):
        print(item, end=", ")

The for loops at the end print the reversed versions of a normal list, and instances of the two custom sequences. The output shows that reversed works on all three of them, but has very different results when we define __reversed__ ourselves:

list: 5, 4, 3, 2, 1,
CustomSequence: x4, x3, x2, x1, x0,
FunkyBackwards: B, A, C, K, W, A, R, D, S, !,

When we reverse CustomSequence, the __getitem__ method is called for each item, which just inserts an x before the index. For FunkyBackwards, the __reversed__ method returns a string, each character of which is output individually in the for loop.

Note

The preceding two classes aren't very good sequences as they don't define a proper version of __iter__, so a forward for loop over them will never end.

Enumerate

Sometimes, when we're looping over a container in a for loop, we want access to the index (the current position in the list) of the current item being processed. The for loop doesn't provide us with indexes, but the enumerate function gives us something better: it creates a sequence of tuples, where the first object in each tuple is the index and the second is the original item.

This is useful if we need to use index numbers directly. Consider some simple code that outputs each of the lines in a file with line numbers:

import sys
filename = sys.argv[1]

with open(filename) as file:
    for index, line in enumerate(file):
        print("{0}: {1}".format(index+1, line), end='')

Running this code using it's own filename as the input file shows how it works:

1: import sys
2: filename = sys.argv[1]
3:
4: with open(filename) as file:
5:     for index, line in enumerate(file):
6:         print("{0}: {1}".format(index+1, line), end='')

The enumerate function returns a sequence of tuples, our for loop splits each tuple into two values, and the print statement formats them together. It adds one to the index for each line number, since enumerate, like all sequences, is zero-based.

We've only touched on a few of the more important Python built-in functions. As you can see, many of them call into object-oriented concepts, while others subscribe to purely functional or procedural paradigms. There are numerous others in the standard library; some of the more interesting ones include:

all and any, which accept an iterable object and return True if all, or any, of the items evaluate to true (such as a nonempty string or list, a nonzero number, an object that is not None, or the literal True).
eval, exec, and compile, which execute string as code inside the interpreter. Be careful with these ones; they are not safe, so don't execute code an unknown user has supplied to you (in general, assume all unknown users are malicious, foolish, or both).
hasattr, getattr, setattr, and delattr, which allow attributes on an object to be manipulated by their string names.
zip, which takes two or more sequences and returns a new sequence of tuples, where each tuple contains a single value from each sequence.
And many more! See the interpreter help documentation for each of the functions listed in dir(__builtins__).

File I/O

Our examples so far that touch the filesystem have operated entirely on text files without much thought to what is going on under the hood. Operating systems, however, actually represent files as a sequence of bytes, not text. We'll do a deep dive into the relationship between bytes and text in Chapter 8, Strings and Serialization. For now, be aware that reading textual data from a file is a fairly involved process. Python, especially Python 3, takes care of most of this work for us behind the scenes. Aren't we lucky?

The concept of files has been around since long before anyone coined the term object-oriented programming. However, Python has wrapped the interface that operating systems provide in a sweet abstraction that allows us to work with file (or file-like, vis-á-vis duck typing) objects.

The open() built-in function is used to open a file and return a file object. For reading text from a file, we only need to pass the name of the file into the function. The file will be opened for reading, and the bytes will be converted to text using the platform default encoding.

Of course, we don't always want to read files; often we want to write data to them! To open a file for writing, we need to pass a mode argument as the second positional argument, with a value of "w":

contents = "Some file contents"
file = open("filename", "w")
file.write(contents)
file.close()

We could also supply the value "a" as a mode argument, to append to the end of the file, rather than completely overwriting existing file contents.

These files with built-in wrappers for converting bytes to text are great, but it'd be awfully inconvenient if the file we wanted to open was an image, executable, or other binary file, wouldn't it?

To open a binary file, we modify the mode string to append 'b'. So, 'wb' would open a file for writing bytes, while 'rb' allows us to read them. They will behave like text files, but without the automatic encoding of text to bytes. When we read such a file, it will return bytes objects instead of str, and when we write to it, it will fail if we try to pass a text object.

Note

These mode strings for controlling how files are opened are rather cryptic and are neither pythonic nor object-oriented. However, they are consistent with virtually every other programming language out there. File I/O is one of the fundamental jobs an operating system has to handle, and all programming languages have to talk to the OS using the same system calls. Just be glad that Python returns a file object with useful methods instead of the integer that most major operating systems use to identify a file handle!

Once a file is opened for reading, we can call the read, readline, or readlines methods to get the contents of the file. The read method returns the entire contents of the file as a str or bytes object, depending on whether there is 'b' in the mode. Be careful not to use this method without arguments on huge files. You don't want to find out what happens if you try to load that much data into memory!

It is also possible to read a fixed number of bytes from a file; we pass an integer argument to the read method describing how many bytes we want to read. The next call to read will load the next sequence of bytes, and so on. We can do this inside a while loop to read the entire file in manageable chunks.

The readline method returns a single line from the file (where each line ends in a newline, a carriage return, or both, depending on the operating system on which the file was created). We can call it repeatedly to get additional lines. The plural readlines method returns a list of all the lines in the file. Like the read method, it's not safe to use on very large files. These two methods even work when the file is open in bytes mode, but it only makes sense if we are parsing text-like data that has newlines at reasonable positions. An image or audio file, for example, will not have newline characters in it (unless the newline byte happened to represent a certain pixel or sound), so applying readline wouldn't make sense.

For readability, and to avoid reading a large file into memory at once, it is often better to use a for loop directly on a file object. For text files, it will read each line, one at a time, and we can process it inside the loop body. For binary files, it's better to read fixed-sized chunks of data using the read() method, passing a parameter for the maximum number of bytes to read.

Writing to a file is just as easy; the write method on file objects writes a string (or bytes, for binary data) object to the file. It can be called repeatedly to write multiple strings, one after the other. The writelines method accepts a sequence of strings and writes each of the iterated values to the file. The writelines method does not append a new line after each item in the sequence. It is basically a poorly named convenience function to write the contents of a sequence of strings without having to explicitly iterate over it using a for loop.

Lastly, and I do mean lastly, we come to the close method. This method should be called when we are finished reading or writing the file, to ensure any buffered writes are written to the disk, that the file has been properly cleaned up, and that all resources associated with the file are released back to the operating system. Technically, this will happen automatically when the script exits, but it's better to be explicit and clean up after ourselves, especially in long-running processes.

Placing it in context

The need to close files when we are finished with them can make our code quite ugly. Because an exception may occur at any time during file I/O, we ought to wrap all calls to a file in a try...finally clause. The file should be closed in the finally clause, regardless of whether I/O was successful. This isn't very Pythonic. Of course, there is a more elegant way to do it.

If we run dir on a file-like object, we see that it has two special methods named __enter__ and __exit__. These methods turn the file object into what is known as a context manager. Basically, if we use a special syntax called the with statement, these methods will be called before and after nested code is executed. On file objects, the __exit__ method ensures the file is closed, even if an exception is raised. We no longer have to explicitly manage the closing of the file. Here is what the with statement looks like in practice:

with open('filename') as file:
    for line in file:
        print(line, end='')

The open call returns a file object, which has __enter__ and __exit__ methods. The returned object is assigned to the variable named file by the as clause. We know the file will be closed when the code returns to the outer indentation level, and that this will happen even if an exception is raised.

The with statement is used in several places in the standard library where startup or cleanup code needs to be executed. For example, the urlopen call returns an object that can be used in a with statement to clean up the socket when we're done. Locks in the threading module can automatically release the lock when the statement has been executed.

Most interestingly, because the with statement can apply to any object that has the appropriate special methods, we can use it in our own frameworks. For example, remember that strings are immutable, but sometimes you need to build a string from multiple parts. For efficiency, this is usually done by storing the component strings in a list and joining them at the end. Let's create a simple context manager that allows us to construct a sequence of characters and automatically convert it to a string upon exit:

class StringJoiner(list):
    def __enter__(self):
        return self

    def __exit__(self, type, value, tb):
        self.result = "".join(self)

This code adds the two special methods required of a context manager to the list class it inherits from. The __enter__ method performs any required setup code (in this case, there isn't any) and then returns the object that will be assigned to the variable after as in the with statement. Often, as we've done here, this is just the context manager object itself. The __exit__ method accepts three arguments. In a normal situation, these are all given a value of None. However, if an exception occurs inside the with block, they will be set to values related to the type, value, and traceback for the exception. This allows the __exit__ method to do any cleanup code that may be required, even if an exception occurred. In our example, we take the irresponsible path and create a result string by joining the characters in the string, regardless of whether an exception was thrown.

While this is one of the simplest context managers we could write, and its usefulness is dubious, it does work with a with statement. Have a look at it in action:

import random, string
with StringJoiner() as joiner:
    for i in range(15):
        joiner.append(random.choice(string.ascii_letters))

print(joiner.result)

This code constructs a string of 15 random characters. It appends these to a StringJoiner using the append method it inherited from list. When the with statement goes out of scope (back to the outer indentation level), the __exit__ method is called, and the result attribute becomes available on the joiner object. We print this value to see a random string.

An alternative to method overloading

One prominent feature of many object-oriented programming languages is a tool called method overloading. Method overloading simply refers to having multiple methods with the same name that accept different sets of arguments. In statically typed languages, this is useful if we want to have a method that accepts either an integer or a string, for example. In non-object-oriented languages, we might need two functions, called add_s and add_i, to accommodate such situations. In statically typed object-oriented languages, we'd need two methods, both called add, one that accepts strings, and one that accepts integers.

In Python, we only need one method, which accepts any type of object. It may have to do some testing on the object type (for example, if it is a string, convert it to an integer), but only one method is required.

However, method overloading is also useful when we want a method with the same name to accept different numbers or sets of arguments. For example, an e-mail message method might come in two versions, one of which accepts an argument for the "from" e-mail address. The other method might look up a default "from" e-mail address instead. Python doesn't permit multiple methods with the same name, but it does provide a different, equally flexible, interface.

We've seen some of the possible ways to send arguments to methods and functions in previous examples, but now we'll cover all the details. The simplest function accepts no arguments. We probably don't need an example, but here's one for completeness:

def no_args():
    pass

Here's how it's called:

no_args()

A function that does accept arguments will provide the names of those arguments in a comma-separated list. Only the name of each argument needs to be supplied.

When calling the function, these positional arguments must be specified in order, and none can be missed or skipped. This is the most common way we've specified arguments in our previous examples:

def mandatory_args(x, y, z):
    pass

To call it:

mandatory_args("a string", a_variable, 5)

Any type of object can be passed as an argument: an object, a container, a primitive, even functions and classes. The preceding call shows a hardcoded string, an unknown variable, and an integer passed into the function.

Default arguments

If we want to make an argument optional, rather than creating a second method with a different set of arguments, we can specify a default value in a single method, using an equals sign. If the calling code does not supply this argument, it will be assigned a default value. However, the calling code can still choose to override the default by passing in a different value. Often, a default value of None, or an empty string or list is suitable.

Here's a function definition with default arguments:

def default_arguments(x, y, z, a="Some String", b=False):
    pass

The first three arguments are still mandatory and must be passed by the calling code. The last two parameters have default arguments supplied.

There are several ways we can call this function. We can supply all arguments in order as though all the arguments were positional arguments:

default_arguments("a string", variable, 8, "", True)

Alternatively, we can supply just the mandatory arguments in order, leaving the keyword arguments to be assigned their default values:

default_arguments("a longer string", some_variable, 14)

We can also use the equals sign syntax when calling a function to provide values in a different order, or to skip default values that we aren't interested in. For example, we can skip the first keyword arguments and supply the second one:

default_arguments("a string", variable, 14, b=True)

Surprisingly, we can even use the equals sign syntax to mix up the order of positional arguments, so long as all of them are supplied:

>>> default_arguments(y=1,z=2,x=3,a="hi")
3 1 2 hi False

With so many options, it may seem hard to pick one, but if you think of the positional arguments as an ordered list, and keyword arguments as sort of like a dictionary, you'll find that the correct layout tends to fall into place. If you need to require the caller to specify an argument, make it mandatory; if you have a sensible default, then make it a keyword argument. Choosing how to call the method normally takes care of itself, depending on which values need to be supplied, and which can be left at their defaults.

One thing to take note of with keyword arguments is that anything we provide as a default argument is evaluated when the function is first interpreted, not when it is called. This means we can't have dynamically generated default values. For example, the following code won't behave quite as expected:

number = 5
def funky_function(number=number):
    print(number)

number=6
funky_function(8)
funky_function()
print(number)

If we run this code, it outputs the number 8 first, but then it outputs the number 5 for the call with no arguments. We had set the variable to the number 6, as evidenced by the last line of output, but when the function is called, the number 5 is printed; the default value was calculated when the function was defined, not when it was called.

This is tricky with empty containers such as lists, sets, and dictionaries. For example, it is common to ask calling code to supply a list that our function is going to manipulate, but the list is optional. We'd like to make an empty list as a default argument. We can't do this; it will create only one list, when the code is first constructed:

>>> def hello(b=[]):
...     b.append('a')
...     print(b)
...
>>> hello()
['a']
>>> hello()
['a', 'a']

Whoops, that's not quite what we expected! The usual way to get around this is to make the default value None, and then use the idiom iargument = argument if argument else [] inside the method. Pay close attention!

Variable argument lists

Default values alone do not allow us all the flexible benefits of method overloading. The thing that makes Python really slick is the ability to write methods that accept an arbitrary number of positional or keyword arguments without explicitly naming them. We can also pass arbitrary lists and dictionaries into such functions.

For example, a function to accept a link or list of links and download the web pages could use such variadic arguments, or varargs. Instead of accepting a single value that is expected to be a list of links, we can accept an arbitrary number of arguments, where each argument is a different link. We do this by specifying the * operator in the function definition:

def get_pages(*links):
    for link in links:
        #download the link with urllib
        print(link)

The *links parameter says "I'll accept any number of arguments and put them all in a list named links". If we supply only one argument, it'll be a list with one element; if we supply no arguments, it'll be an empty list. Thus, all these function calls are valid:

get_pages()
get_pages('http://www.archlinux.org')
get_pages('http://www.archlinux.org',
        'http://ccphillips.net/')

We can also accept arbitrary keyword arguments. These arrive into the function as a dictionary. They are specified with two asterisks (as in **kwargs) in the function declaration. This tool is commonly used in configuration setups. The following class allows us to specify a set of options with default values:

class Options:
    default_options = {
            'port': 21,
            'host': 'localhost',
            'username': None,
            'password': None,
            'debug': False,
            }
    def __init__(self, **kwargs):
        self.options = dict(Options.default_options)
        self.options.update(kwargs)

    def __getitem__(self, key):
        return self.options[key]

All the interesting stuff in this class happens in the __init__ method. We have a dictionary of default options and values at the class level. The first thing the __init__ method does is make a copy of this dictionary. We do that instead of modifying the dictionary directly in case we instantiate two separate sets of options. (Remember, class-level variables are shared between instances of the class.) Then, __init__ uses the update method on the new dictionary to change any non-default values to those supplied as keyword arguments. The __getitem__ method simply allows us to use the new class using indexing syntax. Here's a session demonstrating the class in action:

>>> options = Options(username="dusty", password="drowssap",
        debug=True)
>>> options['debug']
True
>>> options['port']
21
>>> options['username']
'dusty'

We're able to access our options instance using dictionary indexing syntax, and the dictionary includes both default values and the ones we set using keyword arguments.

The keyword argument syntax can be dangerous, as it may break the "explicit is better than implicit" rule. In the preceding example, it's possible to pass arbitrary keyword arguments to the Options initializer to represent options that don't exist in the default dictionary. This may not be a bad thing, depending on the purpose of the class, but it makes it hard for someone using the class to discover what valid options are available. It also makes it easy to enter a confusing typo ("Debug" instead of "debug", for example) that adds two options where only one should have existed.

Keyword arguments are also very useful when we need to accept arbitrary arguments to pass to a second function, but we don't know what those arguments will be. We saw this in action in Chapter 3, When Objects Are Alike, when we were building support for multiple inheritance. We can, of course, combine the variable argument and variable keyword argument syntax in one function call, and we can use normal positional and default arguments as well. The following example is somewhat contrived, but demonstrates the four types in action:

import shutil
import os.path
def augmented_move(target_folder, *filenames,
        verbose=False, **specific):
    '''Move all filenames into the target_folder, allowing
    specific treatment of certain files.'''

    def print_verbose(message, filename):
        '''print the message only if verbose is enabled'''
        if verbose:
            print(message.format(filename))

    for filename in filenames:
        target_path = os.path.join(target_folder, filename)
        if filename in specific:
            if specific[filename] == 'ignore':
                print_verbose("Ignoring {0}", filename)
            elif specific[filename] == 'copy':
                print_verbose("Copying {0}", filename)
                shutil.copyfile(filename, target_path)
        else:
            print_verbose("Moving {0}", filename)
            shutil.move(filename, target_path)

This example will process an arbitrary list of files. The first argument is a target folder, and the default behavior is to move all remaining non-keyword argument files into that folder. Then there is a keyword-only argument, verbose, which tells us whether to print information on each file processed. Finally, we can supply a dictionary containing actions to perform on specific filenames; the default behavior is to move the file, but if a valid string action has been specified in the keyword arguments, it can be ignored or copied instead. Notice the ordering of the parameters in the function; first the positional argument is specified, then the *filenames list, then any specific keyword-only arguments, and finally, a **specific dictionary to hold remaining keyword arguments.

We create an inner helper function, print_verbose, which will print messages only if the verbose key has been set. This function keeps code readable by encapsulating this functionality into a single location.

In common cases, assuming the files in question exist, this function could be called as:

>>> augmented_move("move_here", "one", "two")

This command would move the files one and two into the move_here directory, assuming they exist (there's no error checking or exception handling in the function, so it would fail spectacularly if the files or target directory didn't exist). The move would occur without any output, since verbose is False by default.

If we want to see the output, we can call it with:

>>> augmented_move("move_here", "three", verbose=True)
Moving three

This moves one file named three, and tells us what it's doing. Notice that it is impossible to specify verbose as a positional argument in this example; we must pass a keyword argument. Otherwise, Python would think it was another filename in the *filenames list.

If we want to copy or ignore some of the files in the list, instead of moving them, we can pass additional keyword arguments:

>>> augmented_move("move_here", "four", "five", "six",
        four="copy", five="ignore")

This will move the sixth file and copy the fourth, but won't display any output, since we didn't specify verbose. Of course, we can do that too, and keyword arguments can be supplied in any order:

>>> augmented_move("move_here", "seven", "eight", "nine",
        seven="copy", verbose=True, eight="ignore")
Copying seven
Ignoring eight
Moving nine

Unpacking arguments

There's one more nifty trick involving variable arguments and keyword arguments. We've used it in some of our previous examples, but it's never too late for an explanation. Given a list or dictionary of values, we can pass those values into a function as if they were normal positional or keyword arguments. Have a look at this code:

def show_args(arg1, arg2, arg3="THREE"):
    print(arg1, arg2, arg3)

some_args = range(3)
more_args = {
        "arg1": "ONE",
        "arg2": "TWO"}

print("Unpacking a sequence:", end=" ")

show_args(*some_args)
print("Unpacking a dict:", end=" ")

show_args(**more_args)

Here's what it looks like when we run it:

Unpacking a sequence: 0 1 2
Unpacking a dict: ONE TWO THREE

The function accepts three arguments, one of which has a default value. But when we have a list of three arguments, we can use the * operator inside a function call to unpack it into the three arguments. If we have a dictionary of arguments, we can use the ** syntax to unpack it as a collection of keyword arguments.

This is most often useful when mapping information that has been collected from user input or from an outside source (for example, an Internet page or a text file) to a function or method call.

Remember our earlier example that used headers and lines in a text file to create a list of dictionaries with contact information? Instead of just adding the dictionaries to a list, we could use keyword unpacking to pass the arguments to the __init__ method on a specially built Contact object that accepts the same set of arguments. See if you can adapt the example to make this work.

Default arguments

Here's a function definition with default arguments:

def default_arguments(x, y, z, a="Some String", b=False):
    pass

The first three arguments are still mandatory and must be passed by the calling code. The last two parameters have default arguments supplied.

There are several ways we can call this function. We can supply all arguments in order as though all the arguments were positional arguments:

default_arguments("a string", variable, 8, "", True)

Alternatively, we can supply just the mandatory arguments in order, leaving the keyword arguments to be assigned their default values:

default_arguments("a longer string", some_variable, 14)

default_arguments("a string", variable, 14, b=True)

Surprisingly, we can even use the equals sign syntax to mix up the order of positional arguments, so long as all of them are supplied:

>>> default_arguments(y=1,z=2,x=3,a="hi")
3 1 2 hi False

number = 5
def funky_function(number=number):
    print(number)

number=6
funky_function(8)
funky_function()
print(number)

>>> def hello(b=[]):
...     b.append('a')
...     print(b)
...
>>> hello()
['a']
>>> hello()
['a', 'a']

Variable argument lists

def get_pages(*links):
    for link in links:
        #download the link with urllib
        print(link)

get_pages()
get_pages('http://www.archlinux.org')
get_pages('http://www.archlinux.org',
        'http://ccphillips.net/')

class Options:
    default_options = {
            'port': 21,
            'host': 'localhost',
            'username': None,
            'password': None,
            'debug': False,
            }
    def __init__(self, **kwargs):
        self.options = dict(Options.default_options)
        self.options.update(kwargs)

    def __getitem__(self, key):
        return self.options[key]

>>> options = Options(username="dusty", password="drowssap",
        debug=True)
>>> options['debug']
True
>>> options['port']
21
>>> options['username']
'dusty'

We're able to access our options instance using dictionary indexing syntax, and the dictionary includes both default values and the ones we set using keyword arguments.

import shutil
import os.path
def augmented_move(target_folder, *filenames,
        verbose=False, **specific):
    '''Move all filenames into the target_folder, allowing
    specific treatment of certain files.'''

    def print_verbose(message, filename):
        '''print the message only if verbose is enabled'''
        if verbose:
            print(message.format(filename))

    for filename in filenames:
        target_path = os.path.join(target_folder, filename)
        if filename in specific:
            if specific[filename] == 'ignore':
                print_verbose("Ignoring {0}", filename)
            elif specific[filename] == 'copy':
                print_verbose("Copying {0}", filename)
                shutil.copyfile(filename, target_path)
        else:
            print_verbose("Moving {0}", filename)
            shutil.move(filename, target_path)

In common cases, assuming the files in question exist, this function could be called as:

>>> augmented_move("move_here", "one", "two")

If we want to see the output, we can call it with:

>>> augmented_move("move_here", "three", verbose=True)
Moving three

If we want to copy or ignore some of the files in the list, instead of moving them, we can pass additional keyword arguments:

>>> augmented_move("move_here", "four", "five", "six",
        four="copy", five="ignore")

This will move the sixth file and copy the fourth, but won't display any output, since we didn't specify verbose. Of course, we can do that too, and keyword arguments can be supplied in any order:

>>> augmented_move("move_here", "seven", "eight", "nine",
        seven="copy", verbose=True, eight="ignore")
Copying seven
Ignoring eight
Moving nine

Unpacking arguments

def show_args(arg1, arg2, arg3="THREE"):
    print(arg1, arg2, arg3)

some_args = range(3)
more_args = {
        "arg1": "ONE",
        "arg2": "TWO"}

print("Unpacking a sequence:", end=" ")

show_args(*some_args)
print("Unpacking a dict:", end=" ")

show_args(**more_args)

Here's what it looks like when we run it:

Unpacking a sequence: 0 1 2
Unpacking a dict: ONE TWO THREE

This is most often useful when mapping information that has been collected from user input or from an outside source (for example, an Internet page or a text file) to a function or method call.

Variable argument lists

def get_pages(*links):
    for link in links:
        #download the link with urllib
        print(link)

get_pages()
get_pages('http://www.archlinux.org')
get_pages('http://www.archlinux.org',
        'http://ccphillips.net/')

class Options:
    default_options = {
            'port': 21,
            'host': 'localhost',
            'username': None,
            'password': None,
            'debug': False,
            }
    def __init__(self, **kwargs):
        self.options = dict(Options.default_options)
        self.options.update(kwargs)

    def __getitem__(self, key):
        return self.options[key]

>>> options = Options(username="dusty", password="drowssap",
        debug=True)
>>> options['debug']
True
>>> options['port']
21
>>> options['username']
'dusty'

We're able to access our options instance using dictionary indexing syntax, and the dictionary includes both default values and the ones we set using keyword arguments.

import shutil
import os.path
def augmented_move(target_folder, *filenames,
        verbose=False, **specific):
    '''Move all filenames into the target_folder, allowing
    specific treatment of certain files.'''

    def print_verbose(message, filename):
        '''print the message only if verbose is enabled'''
        if verbose:
            print(message.format(filename))

    for filename in filenames:
        target_path = os.path.join(target_folder, filename)
        if filename in specific:
            if specific[filename] == 'ignore':
                print_verbose("Ignoring {0}", filename)
            elif specific[filename] == 'copy':
                print_verbose("Copying {0}", filename)
                shutil.copyfile(filename, target_path)
        else:
            print_verbose("Moving {0}", filename)
            shutil.move(filename, target_path)

In common cases, assuming the files in question exist, this function could be called as:

>>> augmented_move("move_here", "one", "two")

If we want to see the output, we can call it with:

>>> augmented_move("move_here", "three", verbose=True)
Moving three

If we want to copy or ignore some of the files in the list, instead of moving them, we can pass additional keyword arguments:

>>> augmented_move("move_here", "four", "five", "six",
        four="copy", five="ignore")

This will move the sixth file and copy the fourth, but won't display any output, since we didn't specify verbose. Of course, we can do that too, and keyword arguments can be supplied in any order:

>>> augmented_move("move_here", "seven", "eight", "nine",
        seven="copy", verbose=True, eight="ignore")
Copying seven
Ignoring eight
Moving nine

Unpacking arguments

def show_args(arg1, arg2, arg3="THREE"):
    print(arg1, arg2, arg3)

some_args = range(3)
more_args = {
        "arg1": "ONE",
        "arg2": "TWO"}

print("Unpacking a sequence:", end=" ")

show_args(*some_args)
print("Unpacking a dict:", end=" ")

show_args(**more_args)

Here's what it looks like when we run it:

Unpacking a sequence: 0 1 2
Unpacking a dict: ONE TWO THREE

This is most often useful when mapping information that has been collected from user input or from an outside source (for example, an Internet page or a text file) to a function or method call.

Unpacking arguments

def show_args(arg1, arg2, arg3="THREE"):
    print(arg1, arg2, arg3)

some_args = range(3)
more_args = {
        "arg1": "ONE",
        "arg2": "TWO"}

print("Unpacking a sequence:", end=" ")

show_args(*some_args)
print("Unpacking a dict:", end=" ")

show_args(**more_args)

Here's what it looks like when we run it:

Unpacking a sequence: 0 1 2
Unpacking a dict: ONE TWO THREE

This is most often useful when mapping information that has been collected from user input or from an outside source (for example, an Internet page or a text file) to a function or method call.

Functions are objects too

Programming languages that overemphasize object-oriented principles tend to frown on functions that are not methods. In such languages, you're expected to create an object to sort of wrap the single method involved. There are numerous situations where we'd like to pass around a small object that is simply called to perform an action. This is most frequently done in event-driven programming, such as graphical toolkits or asynchronous servers; we'll see some design patterns that use it in Chapter 10, Python Design Patterns I and Chapter 11, Python Design Patterns II.

In Python, we don't need to wrap such methods in an object, because functions already are objects! We can set attributes on functions (though this isn't a common activity), and we can pass them around to be called at a later date. They even have a few special properties that can be accessed directly. Here's yet another contrived example:

def my_function():
    print("The Function Was Called")
my_function.description = "A silly function"

def second_function():
    print("The second was called")
second_function.description = "A sillier function."

def another_function(function):
    print("The description:", end=" ")
    print(function.description)
    print("The name:", end=" ")
    print(function.__name__)
    print("The class:", end=" ")
    print(function.__class__)
    print("Now I'll call the function passed in")
    function()

another_function(my_function)
another_function(second_function)

If we run this code, we can see that we were able to pass two different functions into our third function, and get different output for each one:

The description: A silly function
The name: my_function
The class: <class 'function'>
Now I'll call the function passed in
The Function Was Called
The description: A sillier function.
The name: second_function
The class: <class 'function'>
Now I'll call the function passed in
The second was called

We set an attribute on the function, named description (not very good descriptions, admittedly). We were also able to see the function's __name__ attribute, and to access its class, demonstrating that the function really is an object with attributes. Then we called the function by using the callable syntax (the parentheses).

The fact that functions are top-level objects is most often used to pass them around to be executed at a later date, for example, when a certain condition has been satisfied. Let's build an event-driven timer that does just this:

import datetime
import time

class TimedEvent:
    def __init__(self, endtime, callback):
        self.endtime = endtime
        self.callback = callback

    def ready(self):
        return self.endtime <= datetime.datetime.now()

class Timer:
    def __init__(self):
        self.events = []

    def call_after(self, delay, callback):
        end_time = datetime.datetime.now() + \
                datetime.timedelta(seconds=delay)
                
        self.events.append(TimedEvent(end_time, callback))

    def run(self):
        while True:
            ready_events = (e for e in self.events if e.ready())
            for event in ready_events:
                event.callback(self)
                self.events.remove(event)
            time.sleep(0.5)

In production, this code should definitely have extra documentation using docstrings! The call_after method should at least mention that the delay parameter is in seconds, and that the callback function should accept one argument: the timer doing the calling.

We have two classes here. The TimedEvent class is not really meant to be accessed by other classes; all it does is store endtime and callback. We could even use a tuple or namedtuple here, but as it is convenient to give the object a behavior that tells us whether or not the event is ready to run, we use a class instead.

The Timer class simply stores a list of upcoming events. It has a call_after method to add a new event. This method accepts a delay parameter representing the number of seconds to wait before executing the callback, and the callback function itself: a function to be executed at the correct time. This callback function should accept one argument.

The run method is very simple; it uses a generator expression to filter out any events whose time has come, and executes them in order. The timer loop then continues indefinitely, so it has to be interrupted with a keyboard interrupt (Ctrl + C or Ctrl + Break). We sleep for half a second after each iteration so as to not grind the system to a halt.

The important things to note here are the lines that touch callback functions. The function is passed around like any other object and the timer never knows or cares what the original name of the function is or where it was defined. When it's time to call the function, the timer simply applies the parenthesis syntax to the stored variable.

Here's a set of callbacks that test the timer:

from timer import Timer
import datetime

def format_time(message, *args):
    now = datetime.datetime.now().strftime("%I:%M:%S")
    print(message.format(*args, now=now))

def one(timer):
    format_time("{now}: Called One")

def two(timer):
    format_time("{now}: Called Two")

def three(timer):
    format_time("{now}: Called Three")

class Repeater:
    def __init__(self):
        self.count = 0
    def repeater(self, timer):
        format_time("{now}: repeat {0}", self.count)
        self.count += 1
        timer.call_after(5, self.repeater)

timer = Timer()
timer.call_after(1, one)
timer.call_after(2, one)
timer.call_after(2, two)
timer.call_after(4, two)
timer.call_after(3, three)
timer.call_after(6, three)
repeater = Repeater()
timer.call_after(5, repeater.repeater)
format_time("{now}: Starting")
timer.run()

This example allows us to see how multiple callbacks interact with the timer. The first function is the format_time function. It uses the string format method to add the current time to the message, and illustrates variable arguments in action. The format_time method will accept any number of positional arguments, using variable argument syntax, which are then forwarded as positional arguments to the string's format method. After this, we create three simple callback methods that simply output the current time and a short message telling us which callback has been fired.

The Repeater class demonstrates that methods can be used as callbacks too, since they are really just functions. It also shows why the timer argument to the callback functions is useful: we can add a new timed event to the timer from inside a presently running callback. We then create a timer and add several events to it that are called after different amounts of time. Finally, we start the timer running; the output shows that events are run in the expected order:

02:53:35: Starting
02:53:36: Called One
02:53:37: Called One
02:53:37: Called Two
02:53:38: Called Three
02:53:39: Called Two
02:53:40: repeat 0
02:53:41: Called Three
02:53:45: repeat 1
02:53:50: repeat 2
02:53:55: repeat 3
02:54:00: repeat 4

Python 3.4 introduces a generic event-loop architecture similar to this. We'll be discussing it later in Chapter 13, Concurrency.

Using functions as attributes

One of the interesting effects of functions being objects is that they can be set as callable attributes on other objects. It is possible to add or change a function to an instantiated object:

class A:
    def print(self):
        print("my class is A")

def fake_print():
    print("my class is not A")

a = A()
a.print()
a.print = fake_print
a.print()

This code creates a very simple class with a print method that doesn't tell us anything we didn't know. Then we create a new function that tells us something we don't believe.

When we call print on an instance of the A class, it behaves as expected. If we then set the print method to point at a new function, it tells us something different:

my class is A
my class is not A

It is also possible to replace methods on classes instead of objects, although in that case we have to add the self argument to the parameter list. This will change the method for all instances of that object, even ones that have already been instantiated. Obviously, replacing methods like this can be both dangerous and confusing to maintain. Somebody reading the code will see that a method has been called and look up that method on the original class. But the method on the original class is not the one that was called. Figuring out what really happened can become a tricky, frustrating debugging session.

It does have its uses though. Often, replacing or adding methods at run time (called monkey-patching) is used in automated testing. If testing a client-server application, we may not want to actually connect to the server while testing the client; this may result in accidental transfers of funds or embarrassing test e-mails being sent to real people. Instead, we can set up our test code to replace some of the key methods on the object that sends requests to the server, so it only records that the methods have been called.

Monkey-patching can also be used to fix bugs or add features in third-party code that we are interacting with, and does not behave quite the way we need it to. It should, however, be applied sparingly; it's almost always a "messy hack". Sometimes, though, it is the only way to adapt an existing library to suit our needs.

Callable objects

Just as functions are objects that can have attributes set on them, it is possible to create an object that can be called as though it were a function.

Any object can be made callable by simply giving it a __call__ method that accepts the required arguments. Let's make our Repeater class, from the timer example, a little easier to use by making it a callable:

class Repeater:
    def __init__(self):
        self.count = 0


    def __call__(self, timer):
        format_time("{now}: repeat {0}", self.count)
        self.count += 1

        timer.call_after(5, self)

timer = Timer()

timer.call_after(5, Repeater())
format_time("{now}: Starting")
timer.run()

This example isn't much different from the earlier class; all we did was change the name of the repeater function to __call__ and pass the object itself as a callable. Note that when we make the call_after call, we pass the argument Repeater(). Those two parentheses are creating a new instance of the class; they are not explicitly calling the class. This happens later, inside the timer. If we want to execute the __call__ method on a newly instantiated object, we'd use a rather odd syntax: Repeater()(). The first set of parentheses constructs the object; the second set executes the __call__ method. If we find ourselves doing this, we may not be using the correct abstraction. Only implement the __call__ function on an object if the object is meant to be treated like a function.

Using functions as attributes

One of the interesting effects of functions being objects is that they can be set as callable attributes on other objects. It is possible to add or change a function to an instantiated object:

class A:
    def print(self):
        print("my class is A")

def fake_print():
    print("my class is not A")

a = A()
a.print()
a.print = fake_print
a.print()

This code creates a very simple class with a print method that doesn't tell us anything we didn't know. Then we create a new function that tells us something we don't believe.

When we call print on an instance of the A class, it behaves as expected. If we then set the print method to point at a new function, it tells us something different:

my class is A
my class is not A

Callable objects

Just as functions are objects that can have attributes set on them, it is possible to create an object that can be called as though it were a function.

class Repeater:
    def __init__(self):
        self.count = 0


    def __call__(self, timer):
        format_time("{now}: repeat {0}", self.count)
        self.count += 1

        timer.call_after(5, self)

timer = Timer()

timer.call_after(5, Repeater())
format_time("{now}: Starting")
timer.run()

Callable objects

Just as functions are objects that can have attributes set on them, it is possible to create an object that can be called as though it were a function.

class Repeater:
    def __init__(self):
        self.count = 0


    def __call__(self, timer):
        format_time("{now}: repeat {0}", self.count)
        self.count += 1

        timer.call_after(5, self)

timer = Timer()

timer.call_after(5, Repeater())
format_time("{now}: Starting")
timer.run()

Case study

To tie together some of the principles presented in this chapter, let's build a mailing list manager. The manager will keep track of e-mail addresses categorized into named groups. When it's time to send a message, we can pick a group and send the message to all e-mail addresses assigned to that group.

Now, before we start working on this project, we ought to have a safe way to test it, without sending e-mails to a bunch of real people. Luckily, Python has our back here; like the test HTTP server, it has a built-in Simple Mail Transfer Protocol (SMTP) server that we can instruct to capture any messages we send without actually sending them. We can run the server with the following command:

python -m smtpd -n -c DebuggingServer localhost:1025

Running this command at a command prompt will start an SMTP server running on port 1025 on the local machine. But we've instructed it to use the DebuggingServer class (it comes with the built-in SMTP module), which, instead of sending mails to the intended recipients, simply prints them on the terminal screen as it receives them. Neat, eh?

Now, before writing our mailing list, let's write some code that actually sends mail. Of course, Python supports this in the standard library, too, but it's a bit of an odd interface, so we'll write a new function to wrap it all cleanly:

import smtplib
from email.mime.text import MIMEText

def send_email(subject, message, from_addr, *to_addrs,
        host="localhost", port=1025, **headers):

    email = MIMEText(message)
    email['Subject'] = subject
    email['From'] = from_addr
    for header, value in headers.items():
        email[header] = value

    sender = smtplib.SMTP(host, port)
    for addr in to_addrs:
        del email['To']
        email['To'] = addr
        sender.sendmail(from_addr, addr, email.as_string())
    sender.quit()

We won't cover the code inside this method too thoroughly; the documentation in the standard library can give you all the information you need to use the smtplib and email modules effectively.

We've used both variable argument and keyword argument syntax in the function call. The variable argument list allows us to supply a single string in the default case of having a single to address, as well as permitting multiple addresses to be supplied if required. Any extra keyword arguments are mapped to e-mail headers. This is an exciting use of variable arguments and keyword arguments, but it's not really a great interface for the person calling the function. In fact, it makes many things the programmer will want to do impossible.

The headers passed into the function represent auxiliary headers that can be attached to a method. Such headers might include Reply-To, Return-Path, or X-pretty-much-anything. But in order to be a valid identifier in Python, a name cannot include the - character. In general, that character represents subtraction. So, it's not possible to call a function with Reply-To = [email protected]. It appears we were too eager to use keyword arguments because they are a new tool we just learned about in this chapter.

We'll have to change the argument to a normal dictionary; this will work because any string can be used as a key in a dictionary. By default, we'd want this dictionary to be empty, but we can't make the default parameter an empty dictionary. So, we'll have to make the default argument None, and then set up the dictionary at the beginning of the method:

def send_email(subject, message, from_addr, *to_addrs,
        host="localhost", port=1025, headers=None):

    headers = {} if headers is None else headers

If we have our debugging SMTP server running in one terminal, we can test this code in a Python interpreter:

>>> send_email("A model subject", "The message contents",
 "[email protected]", "[email protected]", "[email protected]")

Then, if we check the output from the debugging SMTP server, we get the following:

---------- MESSAGE FOLLOWS ----------
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: A model subject
From: [email protected]
To: [email protected]
X-Peer: 127.0.0.1

The message contents
------------ END MESSAGE ------------
---------- MESSAGE FOLLOWS ----------
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: A model subject
From: [email protected]
To: [email protected]
X-Peer: 127.0.0.1

The message contents
------------ END MESSAGE ------------

Excellent, it has "sent" our e-mail to the two expected addresses with subject and message contents included. Now that we can send messages, let's work on the e-mail group management system. We'll need an object that somehow matches e-mail addresses with the groups they are in. Since this is a many-to-many relationship (any one e-mail address can be in multiple groups; any one group can be associated with multiple e-mail addresses), none of the data structures we've studied seems quite ideal. We could try a dictionary of group-names matched to a list of associated e-mail addresses, but that would duplicate e-mail addresses. We could also try a dictionary of e-mail addresses matched to groups, resulting in a duplication of groups. Neither seems optimal. Let's try this latter version, even though intuition tells me the groups to e-mail address solution would be more straightforward.

Since the values in our dictionary will always be collections of unique e-mail addresses, we should probably store them in a set container. We can use defaultdict to ensure that there is always a set container available for each key:

from collections import defaultdict
class MailingList:
    '''Manage groups of e-mail addresses for sending e-mails.'''
    def __init__(self):
        self.email_map = defaultdict(set)

    def add_to_group(self, email, group):
        self.email_map[email].add(group)

Now, let's add a method that allows us to collect all the e-mail addresses in one or more groups. This can be done by converting the list of groups to a set:

    def emails_in_groups(self, *groups):
        groups = set(groups)
        emails = set()
        for e, g in self.email_map.items():
            if g & groups:
                emails.add(e)
        return emails

First, look at what we're iterating over: self.email_map.items(). This method, of course, returns a tuple of key-value pairs for each item in the dictionary. The values are sets of strings representing the groups. We split these into two variables named e and g, short for e-mail and groups. We add the e-mail address to the set of return values only if the passed in groups intersect with the e-mail address groups. The g & groups syntax is a shortcut for g.intersection(groups); the set class does this by implementing the special __and__ method to call intersection.

Tip

This code could be made a wee bit more concise using a set comprehension, which we'll discuss in Chapter 9, The Iterator Pattern.

Now, with these building blocks, we can trivially add a method to our MailingList class that sends messages to specific groups:

def send_mailing(self, subject, message, from_addr,
        *groups, headers=None):
    emails = self.emails_in_groups(*groups)
    send_email(subject, message, from_addr,
            *emails, headers=headers)

This function relies on variable argument lists. As input, it takes a list of groups as variable arguments. It gets the list of e-mails for the specified groups and passes those as variable arguments into send_email, along with other arguments that were passed into this method.

The program can be tested by ensuring the SMTP debugging server is running in one command prompt, and, in a second prompt, loading the code using:

python -i mailing_list.py

Create a MailingList object with:

>>> m = MailingList()

Then create a few fake e-mail addresses and groups, along the lines of:

>>> m.add_to_group("[email protected]", "friends")
>>> m.add_to_group("[email protected]", "friends")
>>> m.add_to_group("[email protected]", "family")
>>> m.add_to_group("[email protected]", "professional")

Finally, use a command like this to send e-mails to specific groups:

>>> m.send_mailing("A Party",
"Friends and family only: a party", "[email protected]", "friends",
"family", headers={"Reply-To": "[email protected]"})

E-mails to each of the addresses in the specified groups should show up in the console on the SMTP server.

The mailing list works fine as it is, but it's kind of useless; as soon as we exit the program, our database of information is lost. Let's modify it to add a couple of methods to load and save the list of e-mail groups from and to a file.

In general, when storing structured data on disk, it is a good idea to put a lot of thought into how it is stored. One of the reasons myriad database systems exist is that if someone else has put this thought into how data is stored, you don't have to. We'll be looking at some data serialization mechanisms in the next chapter, but for this example, let's keep it simple and go with the first solution that could possibly work.

The data format I have in mind is to store each e-mail address followed by a space, followed by a comma-separated list of groups. This format seems reasonable, and we're going to go with it because data formatting isn't the topic of this chapter. However, to illustrate just why you need to think hard about how you format data on disk, let's highlight a few problems with the format.

First, the space character is technically legal in e-mail addresses. Most e-mail providers prohibit it (with good reason), but the specification defining e-mail addresses says an e-mail can contain a space if it is in quotation marks. If we are to use a space as a sentinel in our data format, we should technically be able to differentiate between that space and a space that is part of an e-mail. We're going to pretend this isn't true, for simplicity's sake, but real-life data encoding is full of stupid issues like this. Second, consider the comma-separated list of groups. What happens if someone decides to put a comma in a group name? If we decide to make commas illegal in group names, we should add validation to ensure this to our add_to_group method. For pedagogical clarity, we'll ignore this problem too. Finally, there are many security implications we need to consider: can someone get themselves into the wrong group by putting a fake comma in their e-mail address? What does the parser do if it encounters an invalid file?

The takeaway from this discussion is to try to use a data-storage method that has been field tested, rather than designing your own data serialization protocol. There are a ton of bizarre edge cases you might overlook, and it's better to use code that has already encountered and fixed those edge cases.

But forget that, let's just write some basic code that uses an unhealthy dose of wishful thinking to pretend this simple data format is safe:

[email protected] group1,group2
[email protected] group2,group3

The code to do this is as follows:

    def save(self):
        with open(self.data_file, 'w') as file:
            for email, groups in self.email_map.items():
                file.write(
                    '{} {}\n'.format(email, ','.join(groups))
                )

    def load(self):
        self.email_map = defaultdict(set)
        try:
            with open(self.data_file) as file:
                for line in file:
                    email, groups = line.strip().split(' ')
                    groups = set(groups.split(','))
                    self.email_map[email] = groups
        except IOError:
            pass

In the save method, we open the file in a context manager and write the file as a formatted string. Remember the newline character; Python doesn't add that for us. The load method first resets the dictionary (in case it contains data from a previous call to load) uses the for...in syntax, which loops over each line in the file. Again, the newline character is included in the line variable, so we have to call .strip() to take it off. We'll learn more about such string manipulation in the next chapter.

Before using these methods, we need to make sure the object has a self.data_file attribute, which can be done by modifying __init__:

    def __init__(self, data_file):
        self.data_file = data_file
        self.email_map = defaultdict(set)

We can test these two methods in the interpreter as follows:

>>> m = MailingList('addresses.db')
>>> m.add_to_group('[email protected]', 'friends')
>>> m.add_to_group('[email protected]', 'friends')
>>> m.add_to_group('[email protected]', 'family')
>>> m.save()

The resulting addresses.db file contains the following lines, as expected:

[email protected] friends
[email protected] friends,family

We can also load this data back into a MailingList object successfully:

>>> m = MailingList('addresses.db')
>>> m.email_map
defaultdict(<class 'set'>, {})
>>> m.load()
>>> m.email_map
defaultdict(<class 'set'>, {'[email protected]': {'friends\n'}, '[email protected]': {'family\n'}, '[email protected]': {'friends\n'}})

As you can see, I forgot to do the load command, and it might be easy to forget the save command as well. To make this a little easier for anyone who wants to use our MailingList API in their own code, let's provide the methods to support a context manager:

    def __enter__(self):
        self.load()
        return self

    def __exit__(self, type, value, tb):
        self.save()

These simple methods just delegate their work to load and save, but we can now write code like this in the interactive interpreter and know that all the previously stored addresses were loaded on our behalf, and that the whole list will be saved to the file when we are done:

>>> with MailingList('addresses.db') as ml:
...    ml.add_to_group('[email protected]', 'friends')
...    ml.send_mailing("What's up", "hey friends, how's it going", '[email protected]', 'friends')

Exercises

If you haven't encountered the with statements and context managers before, I encourage you, as usual, to go through your old code and find all the places you were opening files, and make sure they are safely closed using the with statement. Look for places that you could write your own context managers as well. Ugly or repetitive try...finally clauses are a good place to start, but you may find them useful any time you need to do before and/or after tasks in context.

You've probably used many of the basic built-in functions before now. We covered several of them, but didn't go into a great deal of detail. Play with enumerate, zip, reversed, any and all, until you know you'll remember to use them when they are the right tool for the job. The enumerate function is especially important; because not using it results in some pretty ugly code.

Also explore some applications that pass functions around as callable objects, as well as using the __call__ method to make your own objects callable. You can get the same effect by attaching attributes to functions or by creating a __call__ method on an object. In which case would you use one syntax, and when would it be more suitable to use the other?

Our mailing list object could overwhelm an e-mail server if there is a massive number of e-mails to be sent out. Try refactoring it so that you can use different send_email functions for different purposes. One such function could be the version we used here. A different version might put the e-mails in a queue to be sent by a server in a different thread or process. A third version could just output the data to the terminal, obviating the need for a dummy SMTP server. Can you construct the mailing list with a callback such that the send_mailing function uses whatever is passed in? It would default to the current version if no callback is supplied.

The relationship between arguments, keyword arguments, variable arguments, and variable keyword arguments can be a bit confusing. We saw how painfully they can interact when we covered multiple inheritance. Devise some other examples to see how they can work well together, as well as to understand when they don't.

Summary

We covered a grab bag of topics in this chapter. Each represented an important non-object-oriented feature that is popular in Python. Just because we can use object-oriented principles does not always mean we should!

However, we also saw that Python typically implements such features by providing a syntax shortcut to traditional object-oriented syntax. Knowing the object-oriented principles underlying these tools allows us to use them more effectively in our own classes.

We discussed a series of built-in functions and file I/O operations. There are a whole bunch of different syntaxes available to us when calling functions with arguments, keyword arguments, and variable argument lists. Context managers are useful for the common pattern of sandwiching a piece of code between two method calls. Even functions are objects, and, conversely, any normal object can be made callable.

In the next chapter, we'll learn more about string and file manipulation, and even spend some time with one of the least object-oriented topics in the standard library: regular expressions.

Python: Master the Art of Design Patterns

Python: Master the Art of Design Patterns

Overview of this book

Related Content you might be interested in

Current Title:

Python: Master the Art of Design Patterns

Chapter 7. Python Object-oriented Shortcuts

Python built-in functions

The len() function

Reversed

Note

Enumerate

File I/O

Note

Placing it in context

An alternative to method overloading

Default arguments

Variable argument lists

Unpacking arguments

Default arguments

Variable argument lists

Unpacking arguments

Variable argument lists

Unpacking arguments

Unpacking arguments

Functions are objects too

Using functions as attributes

Callable objects

Using functions as attributes

Callable objects

Callable objects

Case study

Tip

Exercises

Summary