Book Image

Secret Recipes of the Python Ninja

Book Image

Secret Recipes of the Python Ninja

Overview of this book

This book covers the unexplored secrets of Python, delve into its depths, and uncover its mysteries. You’ll unearth secrets related to the implementation of the standard library, by looking at how modules actually work. You’ll understand the implementation of collections, decimals, and fraction modules. If you haven’t used decorators, coroutines, and generator functions much before, as you make your way through the recipes, you’ll learn what you’ve been missing out on. We’ll cover internal special methods in detail, so you understand what they are and how they can be used to improve the engineering decisions you make. Next, you’ll explore the CPython interpreter, which is a treasure trove of secret hacks that not many programmers are aware of. We’ll take you through the depths of the PyPy project, where you’ll come across several exciting ways that you can improve speed and concurrency. Finally, we’ll take time to explore the PEPs of the latest versions to discover some interesting hacks.
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell
Foreword
Contributors
Preface
Index

Using and importing modules and namespaces


A key point with modules is that they produce separate namespaces. A namespace (also called a scope) is simply the domain of control that a module, or component of a module, has. Normally, objects within a module are not visible outside that module, that is, attempting to call a variable located in a separate module will produce an error.

Namespaces are also used to segregate objects within the same program. For example, a variable defined within a function is only visible for use while operating within that function. Attempting to call that variable from another function will result in an error. This is why global variables are available; they can be called by any function and interacted with. This is also why global variables are frowned upon as a best practice because of the possibility of modifying a global variable without realizing it, causing a breakage later on in the program.

Scope essentially works inside-out. If a variable is called for use in a function, the Python interpreter will first look within that function for the variable's declaration. If it's not there, Python will move up the stack and look for a globally-defined variable. If not found there, Python will look in the built-in libraries that are always available. If still not found, Python will throw an error. In terms of flow, it looks something like this: local scope -> global scope -> built-in module -> error.

One slight change to the scope discovery process comes when importing modules. Imported modules will be examined for object calls as well, with the caveat that an error will still be generated unless the desired object is explicitly identified via dot-nomenclature.

For example, if you want to generate a random number between 0 and 1,000, you can't just call the randint() function without importing the random library. Once a module is imported, any publicly available classes, methods, functions, and variables can be used by expressly calling them with <module_name> and <object_name>. Following is an example of this:

>>> randint(0, 1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'randint' is not defined
>>> import random
>>> random.randint(0, 1000)
607

In the preceding example, randint() is first called on its own. Since it is not part of the normal Python built-in functions, the interpreter knows nothing about it, thus throwing an error.

However, after importing the random library that actually contains the various random number generation functions, randint() can then be explicitly called via dot-nomenclature, that is, random.randint(). This tells the Python interpreter to look for randint() within the random library, resulting in the desired result.

To clarify, when importing modules into a program, Python assumes some things about namespaces. If a normal import is performed, that is, import foo, then both the main program and foo maintain their separate namespaces. To use a function within the foo module, you have to expressly identify it using dot-nomenclature: foo.bar().

On the other hand, if part of a module is imported, for example, from foo import bar, then that imported component becomes a part of the main program's namespace. This also happens if all components are imported using a wildcard: from foo import *.

The following example shows these properties in action:

>>> from random import randint
>>> randint(0, 10)
2
>>> randrange(0, 25)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'randrange' is not defined

In the preceding example, the randint() function from the random module is expressly imported by itself; this importation puts randint() within the main program's namespace. This allows randint() to be called without having to clarify it as random.randint(). However, when attempting to do the same thing with the randrange() function, an error occurs because it wasn't imported.

How to do it...

To illustrate scope, we will create nested functions, where a function is defined and then called within an enclosing function:

  1. nested_functions.py includes a nested function, and ends with calling the nested function:
      >>> def first_funct():
      ...    x = 1
      ...    print(x)
      ...    def second_funct():
      ...        x = 2
      ...        print(x)
      ...    second_funct()
      ...
  1. First, call the parent function and checks the results:
      >>> first_funct()
      1
      2
  1. Next, call the nested function directly and notice that an error is received:
      >>> second_funct()
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      NameError: name 'second_funct' is not defined
  1. To work with another module, import the desired module:
      >>> import math
  1. Below, we call the sin() function from within the module in the form <module>.<function>:
      >>> math.sin(45)
      0.8509035245341184
  1. Try calling a function, as demonstrated below, without using the dot-nomenclature to specify its library package results in an error:
      >>> sin(45)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
       NameError: name 'sin' is not defined
  1. Alternatively, the example below shows how to import all items from a module using the * wildcard to place the items within the current program's namespace:
      >>> from math import *
      >>> sin(45)
      0.8509035245341184
  1. A common way to run modules as scripts is to simply call the module explicitly from the command line, providing any arguments as necessary. This can be set up by configuring the module to accept command-line arguments, as shown in print_funct.py:
        def print_funct(arg):
            print(arg)
            if __name__ == "__main__":
                import sys
                print_funct(sys.argv[1])
  1. print_mult_args.py shows that, if more than one argument is expected, and the quantity is known, each one can be specified using its respective index values in the arguments list:
        def print_funct(arg1, arg2, arg3):
            print(arg1, arg2, arg3)
        if __name__ == "__main__":
            import sys
            print_funct(sys.argv[1], sys.argv[2], sys.argv[3])
  1. Alternatively, where the function can capture multiple arguments but the quantity is unknown, the *args parameter can be used, as shown below:
      >>> def print_input(*args):
      ...   for val, input in enumerate(args):
      ...       print("{}. {}".format(val, input))
      ...
      >>> print_input("spam", "spam", "eggs", "spam")
      0. spam
      1. spam
      2. eggs
      3. spam

How it works...

The location of a named assignment within the code determines its namespace visibility. In the preceding example, steps 1-3, if you directly call second_funct() immediately after calling first_funct(), you'll get an error stating second_funct() is not defined. This is true, because globally, the second function doesn't exist; it's nested within the first function and can't be seen outside the first function's scope. Everything within the first function is part of its namespace, just as the value for x within the second function can't be called directly but has to use the second_funct() call to get its value.

In the preceding examples, step 4-7, the math module is imported in its entirety, but it keeps its own namespace. Thus, calling math.sin() provides a result, but calling sin() by itself results in an error.

Then, the math module is imported using a wildcard. This tells the Python interpreter to import all the functions into the main namespace, rather than keeping them within the separate math namespace. This time, when sin() is called by itself, it provides the correct answer.

This demonstrates the point that namespaces are important to keep code separated while allowing the use of the same variables and function names. By using dot-nomenclature, the exact object can be called with no fear of name shadowing causing the wrong result to be provided.

In preceding examples, steps 7-10, using sys.argv() allows Python to parse command-line arguments and places them in a list for use. sys.argv([0]) is always the name of the program taking the arguments, so it can be safely ignored. All other arguments are stored in a list and can, therefore, be accessed by their index value.

Using *args tells Python to accept any number of arguments, allowing the program to accept a varying number of input values. An alternative version, **kwargs, does the same thing but with keyword:value pairs.

There's more...

In addition to knowing about namespaces, there are some other important terms to know about when installing and working with modules:

  • https://pypi.python.org/pypi is the primary database for third-party Python packages.
  • pip is the primary installer program for third-party modules and, since Python 3.4, has been included by default with Python binary installations.
  • A virtual Python environment allows packages to be installed for a particular application's development, rather than being installed system-wide.
  • venv has been the primary tool for creating virtual Python environments since Python 3.3. With Python 3.4, it automatically installs pip and setuptools in all virtual environments.
  • The following are common terms for Python files: module, package, library, and distribution. While they have distinct definitions (https://packaging.python.org/glossary/), this book will use them interchangeably at times.

The following is part of dice_roller.py, an example of embedded tests from one of the first Python programs this author wrote when first learning Python:

import random
def randomNumGen(choice):
    if choice == 1: #d6 roll
        die = random.randint(1, 6)
    elif choice == 2: #d10 roll
        die = random.randint(1, 10)
    elif choice == 3: #d100 roll
        die = random.randint(1, 100)
    elif choice == 4: #d4 roll
      die = random.randint(1, 4)
    elif choice == 5: #d8 roll
      die = random.randint(1, 8)
    elif choice == 6: #d12 roll
      die = random.randint(1, 12)
    elif choice == 7: #d20 roll
      die = random.randint(1, 20)
    else: #simple error message
        return "Shouldn't be here. Invalid choice"
    return die
if __name__ == "__main__":
    import sys
    print(randomNumGen(int(sys.argv[1])))

In this example, we are simply creating a random number generator that simulates rolling different polyhedral dice (commonly used in role-playing games). The random library is imported, then the function defining how the dice rolls are generated is created. For each die roll, the integer provided indicates how many sides the die has. With this method, any number of possible values can be simulated with a single integer input.

The key part of this program is at the end. The part if __name__ == "__main__" tells Python that, if the namespace for the module is main, that is, it is the main program and not imported into another program, then the interpreter should run the code below this line. Otherwise, when imported, only the code above this line is available to the main program. (It's also worth noting that this line is necessary for cross-platform compatibility with Windows.)

When this program is called from the command line, the sys library is imported. Then, the first argument provided to the program is read from the command line and passed into the randomNumGen() function as an argument. The result is printed to the screen. Following are some examples of results from this program:

$ python3 dice_roller.py 1
2
$ python3 dice_roller.py 2
10
$ python3 dice_roller.py 3
63
$ python3 dice_roller.py 4
2
$ python3 dice_roller.py 5
5
$ python3 dice_roller.py 6
6
$ python3 dice_roller.py 7
17
$ python3 dice_roller.py 8
Shouldn't be here. Invalid choice

Configuring a module in this manner is an easy way to allow a user to interface directly with the module on a stand-alone basis. It is also a great way to run tests on the script; the tests are only run when the file is called as a stand-alone, otherwise the tests are ignored. dice_roller_tests.py is the full dice-rolling simulator that this author wrote:

import random #randint
def randomNumGen(choice):
    """Get a random number to simulate a d6, d10, or d100 roll."""
    
    if choice == 1: #d6 roll
      die = random.randint(1, 6)
    elif choice == 2: #d10 roll
        die = random.randint(1, 10)
    elif choice == 3: #d100 roll
        die = random.randint(1, 100)
    elif choice == 4: #d4 roll
      die = random.randint(1, 4)
    elif choice == 5: #d8 roll
      die = random.randint(1, 8)
    elif choice == 6: #d12 roll
      die = random.randint(1, 12)
    elif choice == 7: #d20 roll
      die = random.randint(1, 20)
    else: #simple error message
        return "Shouldn't be here. Invalid choice"
    return die
def multiDie(dice_number, die_type):
    """Add die rolls together, e.g. 2d6, 4d10, etc."""
    
#---Initialize variables 
    final_roll = 0
    val = 0
    
    while val < dice_number:
        final_roll += randomNumGen(die_type)
        val += 1
    return final_roll
def test():
    """Test criteria to show script works."""
    
    _1d6 = multiDie(1,1) #1d6
    print("1d6 = ", _1d6, end=' ') 
    _2d6 = multiDie(2,1) #2d6
    print("\n2d6 = ", _2d6, end=' ')
    _3d6 = multiDie(3,1) #3d6
    print("\n3d6 = ", _3d6, end=' ')
    _4d6 = multiDie(4,1) #4d6
    print("\n4d6 = ", _4d6, end=' ')
    _1d10 = multiDie(1,2) #1d10
    print("\n1d10 = ", _1d10, end=' ')
    _2d10 = multiDie(2,2) #2d10
    print("\n2d10 = ", _2d10, end=' ')
    _3d10 = multiDie(2,2) #3d10
    print("\n3d10 = ", _3d10, end=' ')
    _d100 = multiDie(1,3) #d100
    print("\n1d100 = ", _d100, end=' ') 
    
if __name__ == "__main__": #run test() if calling as a separate program
    test()

This program builds on the previous random-dice program by allowing multiple dice to be added together. In addition, the test() function only runs when the program is called by itself to provide a sanity check of the code. The test function would probably be better if it wasn't in a function with the rest of the code, as it is still accessible when the module is imported, as shown below:

>>> import dice_roller_tests.py
>>> dice_roller_tests.test()
1d6 = 1 
2d6 = 8 
3d6 = 10 
4d6 = 12 
1d10 = 5 
2d10 = 8 
3d10 = 6 
1d100 = 26

So, if you have any code you don't want to be accessible when the module is imported, make sure to include it below the line, as it were.