Book Image

Modular Programming with Python

By : Erik Westra
Book Image

Modular Programming with Python

By: Erik Westra

Overview of this book

Python has evolved over the years and has become the primary choice of developers in various fields. The purpose of this book is to help readers develop readable, reliable, and maintainable programs in Python. Starting with an introduction to the concept of modules and packages, this book shows how you can use these building blocks to organize a complex program into logical parts and make sure those parts are working correctly together. Using clearly written, real-world examples, this book demonstrates how you can use modular techniques to build better programs. A number of common modular programming patterns are covered, including divide-and-conquer, abstraction, encapsulation, wrappers and extensibility. You will also learn how to test your modules and packages, how to prepare your code for sharing with other people, and how to publish your modules and packages on GitHub and the Python Package Index so that other people can use them. Finally, you will learn how to use modular design techniques to be a more effective programmer.
Table of Contents (16 chapters)
Modular Programming with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Creating your first module


Now that we've seen what modules are and how they can be used, let's implement our first real Python module. While this module is simple, you may find it a useful addition to the programs you write.

Caching

In computer programming, a cache is a way of storing previously calculated results so that they can be retrieved more quickly. For example, imagine that your program had to calculate shipping costs based on three parameters:

  • The weight of the ordered item

  • The dimensions of the ordered item

  • The customer's location

Calculating the shipping cost based on the customer's location might be quite involved. For example, you may have a fixed charge for deliveries within your city but charge a premium for out-of-town orders based on how far away the customer is. You may even need to send a query to a freight company's API to see how much it will charge to ship the given item.

Since the process of calculating the shipping cost can be quite complex and time consuming, it makes sense to use a cache to store the previously calculated results. This allows you to use the previously calculated results rather than having to recalculate the shipping cost each time. To do this, you would need to structure your calc_shipping_cost() function to look something like the following:

def calc_shipping_cost(params):
    if params in cache:
        shipping_cost = cache[params]
    else:
        ...calculate the shipping cost.
        cache[params] = shipping_cost
    return shipping_cost

As you can see, we take the supplied parameters (in this case, the weight, dimensions, and the customer's location) and check whether there is already an entry in the cache for those parameters. If so, we retrieve the previously-calculated shipping cost from the cache. Otherwise, we go through the possibly time-consuming process of calculating the shipping cost, storing this in the cache using the supplied parameters, and then returning the shipping cost back to the caller.

Notice how the cache variable in the preceding pseudo code looks very much like a Python dictionary—you can store entries in the dictionary based on a given key and then retrieve the entry using this key. There is, however, a crucial difference between a dictionary and a cache: a cache typically has a limit on the number of entries that it can contain, while the dictionary has no such limit. This means that a dictionary will continue to grow forever, possibly taking up all the computer's memory if the program runs for a long time, while a cache will never take too much memory, as the number of entries is limited.

Once the cache reaches its maximum size, an existing entry has to be removed each time a new entry is added so that the cache doesn't continue to grow:

While there are various ways of choosing the entry to remove, the most common way is to remove the least recently used entry, that is, the entry that hasn't been used for the longest period of time.

Caches are very commonly used in computer programs. In fact, even if you haven't yet used a cache in the programs you write, you've almost certainly encountered them before. Has someone ever suggested that you clear your browser's cache to solve a problem with your web browser? Yes, web browsers use a cache to hold previously downloaded images and web pages so that they don't have to be retrieved again, and clearing the contents of the browser cache is a common way of fixing a misbehaving web browser.

Writing a cache module

Let's now write our own Python module to implement a cache. Before we write it, let's think about the functionality that our cache module will require:

  • We're going to limit the size of our cache to 100 entries.

  • We will need an init() function to initialize the cache.

  • We will have a set(key, value) function to store an entry in the cache.

  • A get(key) function will retrieve an entry from the cache. If there is no entry for that key, this function should return None.

  • We'll also need a contains(key) function to check whether a given entry is in the cache.

  • Finally, we'll implement a size() function which returns the number of entries in the cache.

Note

We are deliberately keeping the implementation of this module quite simple. A real cache would make use of a Cache class to allow you to use multiple caches at once. It would also allow the size of the cache to be configured as necessary. To keep things simple, however, we will implement these functions directly within a module, as we want to concentrate on modular programming rather than combining it with object-oriented programming and other techniques.

Go ahead and create a new Python source file named cache.py. This file will hold the Python source code for our new module. At the top of this module, enter the following Python code:

import datetime

MAX_CACHE_SIZE = 100

We will be using the datetime Standard Library module to calculate the least recently used entry in the cache. The second statement, defining MAX_CACHE_SIZE, sets the maximum size for our cache.

Tip

Note that we are following the standard Python convention of defining constants using uppercase letters. This makes them easier to see in your source code.

We now want to implement the init() function for our cache. To do this, add the following to the end of your module:

def init():
    global _cache
    _cache = {} # Maps key to (datetime, value) tuple.

As you can see, we have created a new function named init(). The first statement in this function, global _cache, defines a new variable named _cache. The global statement makes this variable available as a module-level global variable, that is, this variable can be shared by all parts of the cache.py module.

Notice the underscore character at the start of the variable name. In Python, a leading underscore is a convention indicating that a name is private. In other words, the _cache global is intended to be used as an internal part of the cache.py module—the underscore tells you that you shouldn't need to use this variable outside of the cache.py module itself.

The second statement in the init() function sets the _cache global to an empty dictionary. Notice that we've added a comment explaining how the dictionary will be used; it's good practice to add notes like this to your code so others (and you, when you look at this code after a long time working on something else) can easily see what this variable is used for.

In summary, calling the init() function has the effect of creating a private _cache variable within the module and setting it to an empty dictionary. Let's now write the set() function, which will use this variable to store an entry in the cache.

Add the following to the end of your module:

def set(key, value):
    global _cache
    if key not in _cache and len(_cache) >= MAX_CACHE_SIZE:
        _remove_oldest_entry()
    _cache[key] = [datetime.datetime.now(), value]

Once again, the set() function starts with a global _cache statement. This makes the _cache module-level global variable available for the function to use.

The if statement checks to see whether the cache is going to exceed the maximum allowed size. If so, we call a new function, named _remove_oldest_entry(), to remove the oldest entry from the cache. Notice how this function name also starts with an underscore—once again, this indicates that this function is private and should only be used by code within the module itself.

Finally, we store the entry in the _cache dictionary. Notice that we store the current date and time as well as the value in the cache; this will let us know when the cache entry was last used, which is important when we have to remove the oldest entry.

Let's now implement the get() function. Add the following to the end of your module:

def get(key):
    global _cache
    if key in _cache:
        _cache[key][0] = datetime.datetime.now()
        return _cache[key][1]
    else:
        return None

You should be able to figure out what this code does. The only interesting part to note is that we update the date and time for the cache entry before returning the associated value. This lets us know when the cache entry was last used.

With these functions implemented, the remaining two functions should also be easy to understand. Add the following to the end of your module:

def contains(key):
    global _cache
    return key in _cache

def size():
    global _cache
    return len(_cache)

There shouldn't be any surprises here.

There's only one more function left to implement: our private _remove_oldest_entry() function. Add the following to the end of your module:

def _remove_oldest_entry():
    global _cache
    oldest = None
    for key in _cache.keys():
        if oldest == None:
            oldest = key
        elif _cache[key][0] < _cache[oldest][0]:
            oldest = key
    if oldest != None:
        del _cache[oldest]

This completes the implementation of our cache.py module itself, with the five main functions we described earlier, as well as one private function and one private global variable which are used internally to help implement our public functions.

Using the cache

Let's now write a simple test program to use this cache module and verify that it's working properly. Create a new Python source file, which we'll call test_cache.py, and add the following to this file:

import random
import string
import cache

def random_string(length):
    s = ''
    for i in range(length):
        s = s + random.choice(string.ascii_letters)
    return s

cache.init()

for n in range(1000):
    while True:
        key = random_string(20)
        if cache.contains(key):
            continue
        else:
            break
    value = random_string(20)
    cache.set(key, value)
    print("After {} iterations, cache has {} entries".format(n+1, cache.size()))

This program starts by importing three modules: two from the Python Standard Library, and the cache module we have just written. We then define a utility function named random_string(), which generates a string of random letters of a given length. After this, we initialize the cache by calling cache.init() and then generate 1,000 random entries to add to the cache. After adding each cache entry, we print out the number of entries we have added as well as the current cache size.

If you run this program, you can see that it's working as expected:

$ python test_cache.py
After 1 iterations, cache has 1 entries
After 2 iterations, cache has 2 entries
After 3 iterations, cache has 3 entries
...
After 98 iterations, cache has 98 entries
After 99 iterations, cache has 99 entries
After 100 iterations, cache has
 100 entries
After 101 iterations, cache has 100 entries
After 102 iterations, cache has 100 entries
...
After 998 iterations, cache has 100 entries
After 999 iterations, cache has 100 entries
After 1000 iterations, cache has 100 entries

The cache continues to grow until it reaches 100 entries, at which point the oldest entry is removed to make room for a new one. This ensures that the cache stays the same size, no matter how many new entries are added.

While there is a lot more we could do with our cache.py module, this is enough to demonstrate how to create a useful Python module and then use it within another program. Of course, you aren't just limited to importing modules within a main program—modules can import other modules as well.