Book Image

Expert Python Programming – Fourth Edition - Fourth Edition

By : Michał Jaworski, Tarek Ziadé
5 (1)
Book Image

Expert Python Programming – Fourth Edition - Fourth Edition

5 (1)
By: Michał Jaworski, Tarek Ziadé

Overview of this book

This new edition of Expert Python Programming provides you with a thorough understanding of the process of building and maintaining Python apps. Complete with best practices, useful tools, and standards implemented by professional Python developers, this fourth edition has been extensively updated. Throughout this book, you’ll get acquainted with the latest Python improvements, syntax elements, and interesting tools to boost your development efficiency. The initial few chapters will allow experienced programmers coming from different languages to transition to the Python ecosystem. You will explore common software design patterns and various programming methodologies, such as event-driven programming, concurrency, and metaprogramming. You will also go through complex code examples and try to solve meaningful problems by bridging Python with C and C++, writing extensions that benefit from the strengths of multiple languages. Finally, you will understand the complete lifetime of any application after it goes live, including packaging and testing automation. By the end of this book, you will have gained actionable Python programming insights that will help you effectively solve challenging problems.
Table of Contents (16 chapters)
14
Other Books You May Enjoy
15
Index

Inversion of control and dependency injection

Inversion of Control (IoC) is a simple property of some software designs. According to Wiktionary, if a design exhibits IoC, it means that:

(…) the flow of control in a system is inverted in comparison to the traditional architecture.

But what is the traditional architecture? IoC isn't a new idea, and we can trace it back to at least David D. Clark's paper from 1985 titled The structuring of systems using of upcalls. It means that traditional design probably refers to the design of software that was common or thought to be traditional in the 1980s.

Clark describes the traditional architecture of a program as a layered structure of procedures where control always goes from top to bottom. Higher-level layers invoke procedures from lower layers.

Those invoked procedures gain control and can invoke even deeper-layered procedures before returning control upward. In practice, control is traditionally passed from application to library functions. Library functions may pass it deeper to even lower-level libraries but, eventually, return it back to the application.

IoC happens when a library passes control up to the application so that the application can take part in the library behavior. To better understand this concept, consider the following trivial example of sorting a list of integer numbers:

sorted([1,2,3,4,5,6])

The built-in sorted() function takes an iterable of items and returns a list of sorted items. Control goes from the caller (your application) directly to the sorted() function. When the sorted() function is done with sorting, it simply returns the sorted result and gives control back to the caller. Nothing special.

Now let's say we want to sort our numbers in a quite unusual way. That could be, for instance, sorting them by the absolute distance from number 3. Integers closest to 3 should be at the beginning of the result list and the farthest should be at the end. We can do that by defining a simple key function that will specify the order key of our elements:

def distance_from_3(item):
    return abs(item - 3)

Now we can pass that function as the callback key argument to the sorted() function:

sorted([1,2,3,4,5,6], key=distance_from_3)

What will happen now is the sorted() function will invoke the key function on every element of the iterable argument. Instead of comparing item values, it will now compare the return values of the key function. Here is where IoC happens. The sorted() function "upcalls" back to the distance_from_3() function provided by the application as an argument. Now it is a library that calls the functions from the application, and thus the flow of control is reversed.

Callback-based IoC is also humorously referred to as the Hollywood principle in reference to the "don't call us, we'll call you" phrase.

Note that IoC is just a property of a design and not a design pattern by itself. An example with the sorted() function is the simplest example of callback-based IoC but it can take many different forms. For instance:

  • Polymorphism: When a custom class inherits from a base class and base methods are supposed to call custom methods
  • Argument passing: When the receiving function is supposed to call methods of the supplied object
  • Decorators: When a decorator function calls a decorated function
  • Closures: When a nested function calls a function outside of its scope

As you see, IoC is a rather common aspect of object-oriented or functional programming paradigms. And it also happens quite often without you even realizing it. While it isn't a design pattern by itself, it is a key ingredient of many actual design patterns, paradigms, and methodologies. The most notable one is dependency injection, which we will discuss later in this chapter.

Clark's traditional flow of control in procedural programming also happens in object-oriented programming. In object-oriented programs, objects themselves are receivers of control. We can say that control is passed to the object whenever a method of that object is invoked. So the traditional flow of control would require objects to hold full ownership of all dependent objects that are required to fulfill the object's behavior.

Inversion of control in applications

To better illustrate the differences between various flows of control, we will build a small but practical application. It will initially start with a traditional flow of control and later on, we will see if it can benefit from IoC in selected places.

Our use case will be pretty simple and common. We will build a service that can track web page views using so-called tracking pixels and serve page view statistics over an HTTP endpoint. This technique is commonly used in tracking advertisement views or email openings. It can also be useful in situations when you make extensive use of HTTP caching and want to make sure that caching does not affect page view statistics.

Our application will have to track counts of page views in some persistent storage. That will also give us the opportunity to explore application modularity—a characteristic that cannot be implemented without IoC.

What we need to build is a small web backend application that will have two endpoints:

  • /track: This endpoint will return an HTTP response with a 1x1 pixel GIF image. Upon request, it will store the Referer header and increase the number of requests associated with that value.
  • /stats: This endpoint will read the top 10 most common Referer values received on the track/ endpoint and return an HTTP response containing a summary of the results in JSON format.

The Referer header is an optional HTTP header that web browsers will use to tell the web server what is the URL of the origin web page from which the resource is being requested. Take note of the misspelling of the word referrer. The header was first standardized in RFC 1945, Hypertext Transfer Protocol—HTTP/1.0 (see https://tools.ietf.org/html/rfc1945). When the misspelling was discovered, it was already too late to fix it.

We've already introduced Flask as a simple web microframework in Chapter 2, Modern Python Development Environments, so we will use it here as well. Let's start by importing some modules and setting up module variables that we will use on the way:

from collections import Counter
from http import HTTPStatus
from flask import Flask, request, Response
app = Flask(__name__)
storage = Counter()
PIXEL = (
    b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00'
    b'\x00\x00\xff\xff\xff!\xf9\x04\x01\x00'
    b'\x00\x00\x00,\x00\x00\x00\x00\x01\x00'
    b'\x01\x00\x00\x02\x01D\x00;'
)

The app variable is the core object of the Flask framework. It represents a Flask web application. We will use it later to register endpoint routes and also run the application development server.

The storage variable holds a Counter instance. It is a convenient data structure from the Standard Library that allows you to track counters of any immutable values. Our ultimate goal is to store page view statistics in a persistent way, but it will be a lot easier to start off with something simpler. That's why we will initially use this variable as our in-memory storage of page view statistics.

Last but not least, is the PIXEL variable. It holds a byte representation of a 1x1 transparent GIF image. The actual visual appearance of the tracking pixel does not matter and probably will never change. It is also so small that there's no need to bother with loading it from the filesystem. That's why we are inlining it in our module to fit the whole application in a single Python module.

Once we're set, we can write code for the /track endpoint handler:

@app.route('/track')
def track():
    try:
        referer = request.headers["Referer"]
    except KeyError:
        return Response(status=HTTPStatus.BAD_REQUEST)
    storage[referer] += 1
    return Response(
        PIXEL, headers={
            "Content-Type": "image/gif",
            "Expires": "Mon, 01 Jan 1990 00:00:00 GMT",
            "Cache-Control": "no-cache, no-store, must-revalidate",
            "Pragma": "no-cache",
        }
    )

We use extra Expires, Cache-Control, and Pragma headers to control the HTTP caching mechanism. We set them so that they would disable any form of caching on most web browser implementations. We also do it in a way that should disable caching by potential proxies. Take careful note of the Expires header value that is way in the past. This is the lowest possible epoch time and in practice means that resource is always considered expired.

Flask request handlers typically start with the @app.route(route) decorator that registers the following handler function for the given HTTP route. Request handlers are also known as views. Here we have registered the track() view as a handler of the /track route endpoint. This is the first occurrence of IoC in our application: we register our own handler implementation within Flask frameworks. It is a framework that will call back our handlers on incoming requests that match associated routes.

After the signature, we have simple code for handling the request. We check if the incoming request has the expected Referer header. That's the value which the browser uses to tell what URI the requested resource was included on (for instance, the HTML page we want to track). If there's no such header, we will return an error response with a 400 Bad Request HTTP status code.

If the incoming request has the Referer header, we will increase the counter value in the storage variable. The Counter structure has a dict-like interface and allows you to easily modify counter values for keys that haven't been registered yet. In such a case, it will assume that the initial value for the given key was 0. That way we don't need to check whether a specific Referer value was already seen and that greatly simplifies the code. After increasing the counter value, we return a pixel response that can be finally displayed by the browser.

Note that although the storage variable is defined outside the track() function, it is not yet an example of IoC. That's because whoever calls the stats() function can't replace the implementation of the storage. We will try to change that in the next iterations of our application.

The code for the /stats endpoint is even simpler:

@app.route('/stats')
def stats():
    return dict(storage.most_common(10))

In the stats() view, we again take advantage of the convenient interface of the Counter object. It provides the most_common(n) method, which returns up to n most common key-value pairs stored in the structure. We immediately convert that to a dictionary. We don't use the Response class, as Flask by default serializes the non-Response class return values to JSON and assumes a 200 OK status for the HTTP response.

In order to test our application easily, we finish our script with the simple invocation of the built-in development server:

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8000)

If you store the application in the tracking.py file, you will be able to start the server using the python tracking.py command. It will start listening on port 8000. If you would like to test the application in your own browser, you can extend it with the following endpoint handler:

@app.route('/test')
def test():
    return """
    <html>
    <head></head>
    <body><img src="/track"></body>
    </html>
    """

If you open the address http://localhost:8000/test several times in your web browser and then go to http://localhost:8000/stats, you will see output similar to the following:

{"http://localhost:8000/test":6}

The problem with the current implementation is that it stores request counters in memory. Whenever the application is restarted, the existing counters will be reset and we'll lose important data. In order to keep the data between restarts, we will have to replace our storage implementation.

The options to provide data persistency are many. We could, for instance, use:

  • A simple text file
  • The built-in shelve module
  • A relational database management system (RDBMS) like MySQL, MariaDB, or PostgreSQL
  • An in-memory key-value or data struct storage service like Memcached or Redis

Depending on the context and scale of the workload our application needs to handle, the best solution will be different. If we don't know yet what is the best solution, we can also make the storage pluggable so we can switch storage backends depending on the actual user needs. To do so, we will have to invert the flow of control in our track() and stats() functions.

Good design dictates the preparation of some sort of definition of the interface of the object that is responsible for the IoC. The interface of the Counter class seems like a good starting point. It is convenient to use. The only problem is that the += operation can be implemented through either the __add__() or __iadd__() special method. We definitely want to avoid such ambiguity. Also, the Counter class has way too many extra methods and we need only two:

  • A method that allows you to increase the counter value by one
  • A method that allows you to retrieve the 10 most often requested keys

To keep things simple, and readable, we will define our views storage interface as an abstract base class of the following form:

from abc import ABC, abstractmethod
from typing import Dict
class ViewsStorageBackend(ABC):
    @abstractmethod
    def increment(self, key: str): ...
    @abstractmethod
    def most_common(self, n: int): Dict[str, int] ...

From now on, we can provide various implementations of the views storage backend. The following will be the implementation that adapts the previously used Counter class into the ViewsStorageBackend interface:

from collections import Counter
from typing import Dict
from .tracking_abc import ViewsStorageBackend
class CounterBackend(ViewsStorageBackend):
    def __init__(self):
        self._counter = Counter()
    def increment(self, key: str):
        self._counter[key] += 1
    def most_common(self, n: int) -> Dict[str, int]:
        return dict(self._counter.most_common(n))

If we would like to provide persistency through the Redis in-memory storage service, we could do so by implementing a new storage backend as follows:

from typing import Dict
from redis import Redis
class RedisBackend(ViewsStorageBackend):
    def __init__(
        self,
        redis_client: Redis,
        set_name: str
    ):
        self._client = redis_client
        self._set_name = set_name
    def increment(self, key: str):
        self._client.zincrby(self._set_name, 1, key)
    def most_common(self, n: int) -> Dict[str, int]:
        return {
            key.decode(): int(value)
            for key, value in
            self._client.zrange(
                self._set_name, 0, n-1,
                desc=True,
                withscores=True,
            )
        }

Redis is an in-memory data store. This means that by default, data is stored only in memory. Redis will persist data on disk during restart but may lose data in an unexpected crash (for instance, due to a power outage). Still, this is only a default behavior. Redis offers various modes for data persistence, some of which are comparable to other databases. This means Redis is a completely viable storage solution for our simple use case. You can read more about Redis persistence at https://redis.io/topics/persistence.

Both backends have the same interface loosely enforced with an abstract base class. It means instances of both classes can be used interchangeably. The question is, how will we invert control of our track() and stats() functions in a way that will allow us to plug in a different views storage implementation?

Let's recall the signatures of our functions:

@app.route('/stats')
def stats():
   ...
@app.route('/track')
def track():
   ...

In the Flask framework, the app.route() decorator registers a function as a specific route handler. You can think of it as a callback for HTTP request paths. You don't call that function manually anymore and Flask is in full control of the arguments passed to it. But we want to be able to easily replace the storage implementation. One way to do that would be through postponing the handler registration and letting our functions receive an extra storage argument. Consider the following example:

def track(storage: ViewsStorageBackend):
    try:
        referer = request.headers["Referer"]
    except KeyError:
        return Response(status=HTTPStatus.BAD_REQUEST)
    storage.increment(referer)
    return Response(
        PIXEL, headers={
            "Content-Type": "image/gif",
            "Expires": "Mon, 01 Jan 1990 00:00:00 GMT",
            "Cache-Control": "no-cache, no-store, must-revalidate",
            "Pragma": "no-cache",
        }
    )
def stats(storage: ViewsStorageBackend):
    return storage.most_common(10)

Our extra argument is annotated with the ViewsStorageBackend type so the type can be easily verified with an IDE or additional tools. Thanks to this we have inverted control of those functions and also achieved better modularity. Now you can easily switch the implementation of storage for different classes with a compatible interface. The extra benefit of IoC is that we can easily unit-test stats() and track() methods in isolation from storage implementations.

We will discuss the topic of unit-tests together with detailed examples of tests that leverage IoC in Chapter 10, Testing and Quality Automation.

The only part that is missing is actual route registration. We can no longer use the app.route() decorator directly on our functions. That's because Flask won't be able to resolve the storage argument on its own. We can overcome that problem by "pre-injecting" desired storage implementations into handler functions and create new functions that can be easily registered with the app.route() call.

The simple way to do that would be using the partial() function from the functools module. It takes a single function together with a set of arguments and keyword arguments and returns a new function that has selected arguments preconfigured. We can use that approach to prepare various configurations of our service. Here, for instance, is an application configuration that uses Redis as a storage backend:

from functools import partial
if __name__ == '__main__':
    views_storage = RedisBackend(Redis(host="redis"), "my-stats")
    app.route("/track", endpoint="track")(
        partial(track, storage=views_storage))
    app.route("/stats", endpoint="stats")(
        partial(stats, storage=views_storage))
    app.run(host="0.0.0.0", port=8000)

The presented approach can be applied to many other web frameworks as the majority of them have the same route-to-handler structure. It will work especially well for small services with only a handful of endpoints. Unfortunately, it may not scale well in large applications. It is simple to write but definitely not the easiest to read. Seasoned Flask programmers will for sure feel this approach is unnatural and needlessly repetitive. Here, it simply breaks the common convention of writing Flask handler functions.

The ultimate solution would be one that allows you to write and register view functions without the need to manually inject dependent objects. So, for instance:

@app.route('/track')
def track(storage: ViewsStorageBackend):
   ...

In order to do that, from the Flask framework we would need to:

  • Recognize extra arguments as dependencies of views.
  • Allow the definition of a default implementation for said dependencies.
  • Automatically resolve dependencies and inject them into views at runtime.

Such a mechanism is referred to as dependency injection, which we mentioned previously. Some web frameworks offer a built-in dependency injection mechanism, but in the Python ecosystem, it is a rather rare occurrence. Fortunately, there are plenty of lightweight dependency injection libraries that can be added on top of any Python framework. We will explore such a possibility in the next section.

Using dependency injection frameworks

When IoC is used at a great scale, it can easily become overwhelming. The example from the previous section was quite simple so it didn't require a lot of setup. Unfortunately, we have sacrificed a bit of readability and expressiveness for better modularity and responsibility isolation. For larger applications, this can be a serious problem.

Dedicated dependency injection libraries come to the rescue by combining a simple way to mark function or object dependencies with a runtime dependency resolution. All of that usually can be achieved with minimal impact on the overall code structure.

There are plenty of dependency injection libraries for Python, so definitely there is no need to build your own from scratch. They are often similar in implementation and functionality, so we will simply pick one and see how it could be applied in our view tracking application.

Our library of choice will be the injector library, which is freely available on PyPI. We will pick it up for several reasons:

  • Reasonably active and mature: Developed over more than 10 years with releases every few months.
  • Framework support: It has community support for various frameworks including Flask through the flask-injector package.
  • Typing annotation support: It allows writing unobtrusive dependency annotations and leveraging static typing analysis.
  • Simple: injector has a Pythonic API. It makes code easy to read and to reason about.

You can install injector in your environment using pip as follows:

$ pip install injector

You can find more information about injector at https://github.com/alecthomas/injector.

In our example, we will use the flask-injector package as it provides some initial boilerplate to integrate injector with Flask seamlessly. But before we do that, we will first separate our application into several modules that would better simulate a larger application. After all, dependency injection really shines in applications that have multiple components.

We will create the following Python modules:

  • interfaces: This will be the module holding our interfaces. It will contain ViewsStorageBackend from the previous section without any changes.
  • backends: This will be the module holding specific implementations of storage backends. It will contain CounterBackend and RedisBackend from the previous section without any changes.
  • tracking: This will be the module holding the application setup together with view functions.
  • di: This will be the module holding definitions for the injector library, which will allow it to automatically resolve dependencies.

The core of the injector library is a Module class. It defines a so-called dependency injection container—an atomic block of mapping between dependency interfaces and their actual implementation instances. The minimal Module subclass may look as follows:

from injector import Module, provider
def MyModule(Module):
    @provider
    def provide_dependency(self, *args) -> Type:
        return ...

The @provider decorator marks a Module method as a method providing the implementation for a particular Type interface. The creation of some objects may be complex, so injector allows modules to have additional nondecorated helper methods.

The method that provides dependency may also have its own dependencies. They are defined as method arguments with type annotations. This allows for cascading dependency resolution. injector supports composing dependency injection context from multiple modules so there's no need to define all dependencies in a single module.

Using the above template, we can create our first injector module in the di.py file. It will be CounterModule, which provides a CounterBackend implementation for the ViewsStorageBackend interface. The definition will be as follows:

from injector import Module, provider, singleton
from interfaces import ViewsStorageBackend
from backends import CounterBackend
class CounterModule(Module):
    @provider
    @singleton
    def provide_storage(self) -> ViewsStorageBackend:
        return CounterBackend()

CounterStorage doesn't take any arguments, so we don't have to define extra dependencies. The only difference from the general module template is the @singleton decorator. It is an explicit implementation of the singleton design pattern. A singleton is simply a class that can have only a single instance. In this context, it means that every time this dependency is resolved, injector will always return the same object. We need that because CounterStorage stores view counters under the internal _counter attribute. Without the @singleton decorator, every request for the ViewsStorageBackend implementation would return a completely new object and thus we would constantly lose track of view numbers.

The implementation of RedisModule will be only slightly more complex:

from injector import Module, provider, singleton
from redis import Redis
from interfaces import ViewsStorageBackend
from backends import RedisBackend
class RedisModule(Module):
    @provider
    def provide_storage(self, client: Redis) -> ViewsStorageBackend:
        return RedisBackend(client, "my-set")
    @provider
    @singleton
    def provide_redis_client(self) -> Redis:
        return Redis(host="redis")

The code files for this chapter provide a complete docker-compose environment with a preconfigured Redis Docker image so you don't have to install Redis on your own host.

In the RedisStorage module, we take advantage of the injector library's ability to resolve cascading dependencies. The RedisBackend constructor requires a Redis client instance so we can treat it as another provide_storage() method argument. injector will recognize typing annotation and automatically match the method that provides the Redis class instance. We could go even further and extract a host argument to separate configuration dependency. We won't do that for the sake of simplicity.

Now we have to tie everything up in the tracking module. We will be relying on injector to resolve dependencies on views. This means that we can finally define track() and stats() handlers with extra storage arguments and register them with the @app.route() decorator as if they were normal Flask views. Updated signatures will be the following:

@app.route('/stats')
def stats(storage: ViewsStorageBackend):
   ...
@app.route('/track')
def track(storage: ViewsStorageBackend):
   ...

What is left is the final configuration of the app that designates which modules should be used to provide interface implementations. If we would like to use RedisBackend, we would finish our tracking module with the following code:

import di
if __name__ == '__main__':
    FlaskInjector(app=app, modules=[di.RedisModule()])
    app.run(host="0.0.0.0", port=8000)

The following is the complete code of the tracking module:

from http import HTTPStatus
from flask import Flask, request, Response
from flask_injector import FlaskInjector
from interfaces import ViewsStorageBackend
import di
app = Flask(__name__)
PIXEL = (
    b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00'
    b'\x00\x00\xff\xff\xff!\xf9\x04\x01\x00'
    b'\x00\x00\x00,\x00\x00\x00\x00\x01\x00'
    b'\x01\x00\x00\x02\x01D\x00;'
)
@app.route('/track')
def track(storage: ViewsStorageBackend):
    try:
        referer = request.headers["Referer"]
    except KeyError:
        return Response(status=HTTPStatus.BAD_REQUEST)
    storage.increment(referer)
    return Response(
        PIXEL, headers={
            "Content-Type": "image/gif",
            "Expires": "Mon, 01 Jan 1990 00:00:00 GMT",
            "Cache-Control": "no-cache, no-store, must-revalidate",
            "Pragma": "no-cache",
        }
    )
@app.route('/stats')
def stats(storage: ViewsStorageBackend):
    return storage.most_common(10)
@app.route("/test")
def test():
    return """
    <html>
    <head></head>
    <body><img src="/track"></body>
    </html>
    """
if __name__ == '__main__':
    FlaskInjector(app=app, modules=[di.RedisModule()])
    app.run(host="0.0.0.0", port=8000)

As you can see, the introduction of the dependency injection mechanism didn't change the core of our application a lot. The preceding code closely resembles the first and simplest iteration, which didn't have the IoC mechanism. At the cost of a few interface and injector module definitions, we've got scaffolding for a modular application that could easily grow into something much bigger. We could, for instance, extend it with additional storage that would serve more analytical purposes or provide a dashboard that allows you to view the data at different angles.

Another advantage of dependency injection is loose coupling. In our example, views never create instances of storage backends nor their underlying service clients (in the case of RedisBackend). They depend on shared interfaces but are independent of implementations. Loose coupling is usually a good foundation for a well-architected application.

It is of course hard to show the utility of IoC and dependency injection in a really concise example like the one we've just seen. That's because these techniques really shine in big applications. Anyway, we will revisit the use case of the pixel tracking application in Chapter 10, Testing and Quality Automation, where we will show that IoC greatly improves the testability of your code.