Book Image

Python Object-Oriented Programming - Fourth Edition

By : Steven F. Lott, Dusty Phillips
2 (2)
Book Image

Python Object-Oriented Programming - Fourth Edition

2 (2)
By: Steven F. Lott, Dusty Phillips

Overview of this book

Object-oriented programming (OOP) is a popular design paradigm in which data and behaviors are encapsulated in such a way that they can be manipulated together. Python Object-Oriented Programming, Fourth Edition dives deep into the various aspects of OOP, Python as an OOP language, common and advanced design patterns, and hands-on data manipulation and testing of more complex OOP systems. These concepts are consolidated by open-ended exercises, as well as a real-world case study at the end of every chapter, newly written for this edition. All example code is now compatible with Python 3.9+ syntax and has been updated with type hints for ease of learning. Steven and Dusty provide a comprehensive, illustrative tour of important OOP concepts, such as inheritance, composition, and polymorphism, and explain how they work together with Python’s classes and data structures to facilitate good design. In addition, the book also features an in-depth look at Python’s exception handling and how functional programming intersects with OOP. Two very powerful automated testing systems, unittest and pytest, are introduced. The final chapter provides a detailed discussion of Python's concurrent programming ecosystem. By the end of the book, you will have a thorough understanding of how to think about and apply object-oriented principles using Python syntax and be able to confidently create robust and reliable programs.
Table of Contents (17 chapters)
15
Other Books You May Enjoy
16
Index

Modules and packages

Now we know how to create classes and instantiate objects. You don't need to write too many classes (or non-object-oriented code, for that matter) before you start to lose track of them. For small programs, we generally put all our classes into one file and add a little script at the end of the file to start them interacting. However, as our projects grow, it can become difficult to find the one class that needs to be edited among the many classes we've defined. This is where modules come in. Modules are Python files, nothing more. The single file in our small program is a module. Two Python files are two modules. If we have two files in the same folder, we can load a class from one module for use in the other module.

The Python module name is the file's stem; the name without the .py suffix. A file named model.py is a module named model. Module files are found by searching a path that includes the local directory and the installed packages.

The import statement is used for importing modules or specific classes or functions from modules. We've already seen an example of this in our Point class in the previous section. We used the import statement to get Python's built-in math module and use its hypot() function in the distance calculation. Let's start with a fresh example.

If we are building an e-commerce system, we will likely be storing a lot of data in a database. We can put all the classes and functions related to database access into a separate file (we'll call it something sensible: database.py). Then, our other modules (for example, customer models, product information, and inventory) can import classes from the database module in order to access the database.

Let's start with a module called database. It's a file, database.py, containing a class called Database. A second module called products is responsible for product-related queries. The classes in the products module need to instantiate the Database class from the database module so that they can execute queries on the product table in the database.

There are several variations on the import statement syntax that can be used to access the Database class. One variant is to import the module as a whole:

>>> import database
>>> db = database.Database("path/to/data")

This version imports the database module, creating a database namespace. Any class or function in the database module can be accessed using the database.<something> notation.

Alternatively, we can import just the one class we need using the from...import syntax:

>>> from database import Database
>>> db = Database("path/to/data")

This version imported only the Database class from the database module. When we have a few items from a few modules, this can be a helpful simplification to avoid using longer, fully qualified names like database.Database. When we import a number of items from a number of different modules, this can be a potential source of confusion when we omit the qualifiers.

If, for some reason, products already has a class called Database, and we don't want the two names to be confused, we can rename the class when used inside the products module:

>>> from database import Database as DB
>>> db = DB("path/to/data")

We can also import multiple items in one statement. If our database module also contains a Query class, we can import both classes using the following code:

from database import Database, Query

We can import all classes and functions from the database module using this syntax:

from database import * 

Don't do this. Most experienced Python programmers will tell you that you should never use this syntax (a few will tell you there are some very specific situations where it is useful, but we can disagree). One way to learn why to avoid this syntax is to use it and try to understand your code two years later. We can save some time and two years of poorly written code with a quick explanation now!

We've got several reasons for avoiding this:

  • When we explicitly import the database class at the top of our file using from database import Database, we can easily see where the Database class comes from. We might use db = Database() 400 lines later in the file, and we can quickly look at the imports to see where that Database class came from. Then, if we need clarification as to how to use the Database class, we can visit the original file (or import the module in the interactive interpreter and use the help(database.Database) command). However, if we use the from database import * syntax, it takes a lot longer to find where that class is located. Code maintenance becomes a nightmare.
  • If there are conflicting names, we're doomed. Let's say we have two modules, both of which provide a class named Database. Using from module_1 import * and from module_2 import * means the second import statement overwrites the Database name created by the first import. If we used import module_1 and import module_2, we'd use the module names as qualifiers to disambiguate module_1.Database from module_2.Database.
  • In addition, most code editors are able to provide extra functionality, such as reliable code completion, the ability to jump to the definition of a class, or inline documentation, if normal imports are used. The import * syntax can hamper their ability to do this reliably.
  • Finally, using the import * syntax can bring unexpected objects into our local namespace. Sure, it will import all the classes and functions defined in the module being imported from, but unless a special __all__ list is provided in the module, this import will also import any classes or modules that were themselves imported into that file!

Every name used in a module should come from a well-specified place, whether it is defined in that module, or explicitly imported from another module. There should be no magic variables that seem to come out of thin air. We should always be able to immediately identify where the names in our current namespace originated. We promise that if you use this evil syntax, you will one day have extremely frustrating moments of where on earth can this class be coming from?

For fun, try typing import this into your interactive interpreter. It prints a nice poem (with a couple of inside jokes) summarizing some of the idioms that Pythonistas tend to practice. Specific to this discussion, note the line "Explicit is better than implicit." Explicitly importing names into your namespace makes your code much easier to navigate than the implicit from module import * syntax.

Organizing modules

As a project grows into a collection of more and more modules, we may find that we want to add another level of abstraction, some kind of nested hierarchy on our modules' levels. However, we can't put modules inside modules; one file can hold only one file after all, and modules are just files.

Files, however, can go in folders, and so can modules. A package is a collection of modules in a folder. The name of the package is the name of the folder. We need to tell Python that a folder is a package to distinguish it from other folders in the directory. To do this, place a (normally empty) file in the folder named __init__.py. If we forget this file, we won't be able to import modules from that folder.

Let's put our modules inside an ecommerce package in our working folder, which will also contain a main.py file to start the program. Let's additionally add another package inside the ecommerce package for various payment options.

We need to exercise some caution in creating deeply nested packages. The general advice in the Python community is "flat is better than nested." In this example, we need to create a nested package because there are some common features to all of the various payment alternatives.

The folder hierarchy will look like this, rooted under a directory in the project folder, commonly named src:

src/
 +-- main.py
 +-- ecommerce/
     +-- __init__.py
     +-- database.py
     +-- products.py
     +-- payments/
     |   +-- __init__.py
     |   +-- common.py
     |   +-- square.py
     |   +-- stripe.py
     +-- contact/
         +-- __init__.py
         +-- email.py

The src directory will be part of an overall project directory. In addition to src, the project will often have directories with names like docs and tests. It's common for the project parent directory to also have configuration files for tools like mypy among others. We'll return to this in Chapter 13, Testing Object-Oriented Programs.

When importing modules or classes between packages, we have to be cautious about the structure of our packages. In Python 3, there are two ways of importing modules: absolute imports and relative imports. We'll look at each of them separately.

Absolute imports

Absolute imports specify the complete path to the module, function, or class we want to import. If we need access to the Product class inside the products module, we could use any of these syntaxes to perform an absolute import:

>>> import ecommerce.products
>>> product = ecommerce.products.Product("name1") 

Or, we could specifically import a single class definition from the module within a package:

>>> from ecommerce.products import Product 
>>> product = Product("name2") 

Or, we could import an entire module from the containing package:

>>> from ecommerce import products 
>>> product = products.Product("name3") 

The import statements use the period operator to separate packages or modules. A package is a namespace that contains module names, much in the way an object is a namespace containing attribute names.

These statements will work from any module. We could instantiate a Product class using this syntax in main.py, in the database module, or in either of the two payment modules. Indeed, assuming the packages are available to Python, it will be able to import them. For example, the packages can also be installed in the Python site-packages folder, or the PYTHONPATH environment variable could be set to tell Python which folders to search for packages and modules it is going to import.

With these choices, which syntax do we choose? It depends on your audience and the application at hand. If there are dozens of classes and functions inside the products module that we want to use, we'd generally import the module name using the from ecommerce import products syntax, and then access the individual classes using products.Product. If we only need one or two classes from the products module, we can import them directly using the from ecommerce.products import Product syntax. It's important to write whatever makes the code easiest for others to read and extend.

Relative imports

When working with related modules inside a deeply nested package, it seems kind of redundant to specify the full path; we know what our parent module is named. This is where relative imports come in. Relative imports identify a class, function, or module as it is positioned relative to the current module. They only make sense inside module files, and, further, they only make sense where there's a complex package structure.

For example, if we are working in the products module and we want to import the Database class from the database module next to it, we could use a relative import:

from .database import Database 

The period in front of database says use the database module inside the current package. In this case, the current package is the package containing the products.py file we are currently editing, that is, the ecommerce package.

If we were editing the stripe module inside the ecommerce.payments package, we would want, for example, to use the database package inside the parent package instead. This is easily done with two periods, as shown here:

from ..database import Database 

We can use more periods to go further up the hierarchy, but at some point, we have to acknowledge that we have too many packages. Of course, we can also go down one side and back up the other. The following would be a valid import from the ecommerce.contact package containing an email module if we wanted to import the send_mail function into our payments.stripe module:

from ..contact.email import send_mail

This import uses two periods indicating the parent of the payments.stripe package, and then uses the normal package.module syntax to go back down into the contact package to name the email module.

Relative imports aren't as useful as they might seem. As mentioned earlier, the Zen of Python (you can read it when you run import this) suggests "flat is better than nested". Python's standard library is relatively flat, with few packages and even fewer nested packages. If you're familiar with Java, the packages are deeply nested, something the Python community likes to avoid. A relative import is needed to solve a specific problem where module names are reused among packages. They can be helpful in a few cases. Needing more than two dots to locate a common parent-of-a-parent package suggests the design should be flattened out.

Packages as a whole

We can import code that appears to come directly from a package, as opposed to a module inside a package. As we'll see, there's a module involved, but it has a special name, so it's hidden. In this example, we have an ecommerce package containing two module files named database.py and products.py. The database module contains a db variable that is accessed from a lot of places. Wouldn't it be convenient if this could be imported as from ecommerce import db instead of from ecommerce.database import db?

Remember the __init__.py file that defines a directory as a package? This file can contain any variable or class declarations we like, and they will be available as part of the package. In our example, if the ecommerce/__init__.py file contained the following line:

from .database import db

We could then access the db attribute from main.py or any other file using the following import:

from ecommerce import db

It might help to think of the ecommerce/__init__.py file as if it were the ecommerce.py file. It lets us view the ecommerce package as having a module protocol as well as a package protocol. This can also be useful if you put all your code in a single module and later decide to break it up into a package of modules. The __init__.py file for the new package can still be the main point of contact for other modules using it, but the code can be internally organized into several different modules or subpackages.

We recommend not putting much code in an __init__.py file, though. Programmers do not expect actual logic to happen in this file, and much like with from x import *, it can trip them up if they are looking for the declaration of a particular piece of code and can't find it until they check __init__.py.

After looking at modules in general, let's dive into what should be inside a module. The rules are flexible (unlike other languages). If you're familiar with Java, you'll see that Python gives you some freedom to bundle things in a way that's meaningful and informative.

Organizing our code in modules

The Python module is an important focus. Every application or web service has at least one module. Even a seemingly "simple" Python script is a module. Inside any one module, we can specify variables, classes, or functions. They can be a handy way to store the global state without namespace conflicts. For example, we have been importing the Database class into various modules and then instantiating it, but it might make more sense to have only one database object globally available from the database module. The database module might look like this:

class Database:
    """The Database Implementation"""
    def __init__(self, connection: Optional[str] = None) -> None:
        """Create a connection to a database."""
        pass
database = Database("path/to/data")

Then we can use any of the import methods we've discussed to access the database object, for example:

from ecommerce.database import database 

A problem with the preceding class is that the database object is created immediately when the module is first imported, which is usually when the program starts up. This isn't always ideal, since connecting to a database can take a while, slowing down startup, or the database connection information may not yet be available because we need to read a configuration file. We could delay creating the database until it is actually needed by calling an initialize_database() function to create a module-level variable:

db: Optional[Database] = None
def initialize_database(connection: Optional[str] = None) -> None:
    global db
    db = Database(connection)

The Optional[Database] type hint signals to the mypy tool that this may be None or it may have an instance of the Database class. The Optional hint is defined in the typing module. This hint can be handy elsewhere in our application to make sure we confirm that the value for the database variable is not None.

The global keyword tells Python that the database variable inside initialize_database() is the module-level variable, outside the function. If we had not specified the variable as global, Python would have created a new local variable that would be discarded when the function exits, leaving the module-level value unchanged.

We need to make one additional change. We need to import the database module as a whole. We can't import the db object from inside the module; it might not have been initialized. We need to be sure database.initialize_database() is called before db will have a meaningful value. If we wanted direct access to the database object, we'd use database.db.

A common alternative is a function that returns the current database object. We could import this function everywhere we needed access to the database:

def get_database(connection: Optional[str] = None) -> Database:
    global db
    if not db:
        db = Database(connection) 
    return db

As these examples illustrate, all module-level code is executed immediately at the time it is imported. The class and def statements create code objects to be executed later when the function is called. This can be a tricky thing for scripts that perform execution, such as the main script in our e-commerce example. Sometimes, we write a program that does something useful, and then later find that we want to import a function or class from that module into a different program. However, as soon as we import it, any code at the module level is immediately executed. If we are not careful, we can end up running the first program when we really only meant to access a couple of functions inside that module.

To solve this, we should always put our startup code in a function (conventionally, called main()) and only execute that function when we know we are running the module as a script, but not when our code is being imported from a different script. We can do this by guarding the call to main inside a conditional statement, demonstrated as follows:

class Point:
    """
    Represents a point in two-dimensional geometric coordinates.
    """
    pass
def main() -> None:
    """
    Does the useful work.
    >>> main()
    p1.calculate_distance(p2)=5.0
    """
    p1 = Point()
    p2 = Point(3, 4)
    print(f"{p1.calculate_distance(p2)=}")
if __name__ == "__main__":
    main()

The Point class (and the main() function) can be reused without worry. We can import the contents of this module without any surprising processing happening. When we run it as a main program, however, it executes the main() function.

This works because every module has a __name__ special variable (remember, Python uses double underscores for special variables, such as a class' __init__ method) that specifies the name of the module when it was imported. When the module is executed directly with python module.py, it is never imported, so the __name__ is arbitrarily set to the "__main__" string.

Make it a policy to wrap all your scripts in an if __name__ == "__main__": test, just in case you write a function that you may want to be imported by other code at some point in the future.

So, methods go in classes, which go in modules, which go in packages. Is that all there is to it?

Actually, no. This is the typical order of things in a Python program, but it's not the only possible layout. Classes can be defined anywhere. They are typically defined at the module level, but they can also be defined inside a function or method, like this:

from typing import Optional
class Formatter:
    def format(self, string: str) -> str:
        pass
def format_string(string: str, formatter: Optional[Formatter] = None) -> str:
    """
    Format a string using the formatter object, which
    is expected to have a format() method that accepts
    a string.
    """
    class DefaultFormatter(Formatter):
        """Format a string in title case."""
        def format(self, string: str) -> str:
            return str(string).title()
    if not formatter:
        formatter = DefaultFormatter()
    return formatter.format(string)

We've defined a Formatter class as an abstraction to explain what a formatter class needs to have. We haven't used the abstract base class (abc) definitions (we'll look at these in detail in Chapter 6, Abstract Base Classes and Operator Overloading). Instead, we've provided the method with no useful body. It has a full suite of type hints, to make sure mypy has a formal definition of our intent.

Within the format_string() function, we created an internal class that is an extension of the Formatter class. This formalizes the expectation that our class inside the function has a specific set of methods. This connection between the definition of the Formatter class, the formatter parameter, and the concrete definition of the DefaultFormatter class assures that we haven't accidentally forgotten something or added something.

We can execute this function like this:

>>> hello_string = "hello world, how are you today?"
>>> print(f" input: {hello_string}")
 input: hello world, how are you today?
>>> print(f"output: {format_string(hello_string)}")
output: Hello World, How Are You Today?

The format_string function accepts a string and optional Formatter object and then applies the formatter to that string. If no Formatter instance is supplied, it creates a formatter of its own as a local class and instantiates it. Since it is created inside the scope of the function, this class cannot be accessed from anywhere outside of that function. Similarly, functions can be defined inside other functions as well; in general, any Python statement can be executed at any time.

These inner classes and functions are occasionally useful for one-off items that don't require or deserve their own scope at the module level, or only make sense inside a single method. However, it is not common to see Python code that frequently uses this technique.

We've seen how to create classes and how to create modules. With these core techniques, we can start thinking about writing useful, helpful software to solve problems. When the application or service gets big, though, we often have boundary issues. We need to be sure that objects respect each other's privacy and avoid confusing entanglements that make complex software into a spaghetti bowl of interrelationships. We'd prefer each class to be a nicely encapsulated ravioli. Let's look at another aspect of organizing our software to create a good design.