Book Image

Expert Python Programming - Second Edition

By : Michał Jaworski
Book Image

Expert Python Programming - Second Edition

By: Michał Jaworski

Overview of this book

Python is a dynamic programming language, used in a wide range of domains by programmers who find it simple, yet powerful. Even if you find writing Python code easy, writing code that is efficient and easy to maintain and reuse is a challenge. The focus of the book is to familiarize you with common conventions, best practices, useful tools and standards used by python professionals on a daily basis when working with code. You will begin with knowing new features in Python 3.5 and quick tricks for improving productivity. Next, you will learn advanced and useful python syntax elements brought to this new version. Using advanced object-oriented concepts and mechanisms available in python, you will learn different approaches to implement metaprogramming. You will learn to choose good names, write packages, and create standalone executables easily. You will also be using some powerful tools such as buildout and vitualenv to release and deploy the code on remote servers for production use. Moving on, you will learn to effectively create Python extensions with C, C++, cython, and pyrex. The important factors while writing code such as code management tools, writing clear documentation, and test-driven development are also covered. You will now dive deeper to make your code efficient with general rules of optimization, strategies for finding bottlenecks, and selected tools for application optimization. By the end of the book, you will be an expert in writing efficient and maintainable code.
Table of Contents (21 chapters)
Expert Python Programming Second Edition
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

The main differences between Python 3 and Python 2


It has already been said that Python 3 breaks backwards compatibility with Python 2. Still, it is not a complete redesign. Also, it does not mean that every Python module written for a 2.x release will stop working under Python 3. It is possible to write completely cross-compatible code that will run on both major releases without additional tools or techniques, but usually it is possible only for simple applications.

Why should I care?

Despite my personal opinion on Python 2 compatibility, exposed earlier in this chapter, it is impossible to simply forget about it right at this time. There are still some useful packages (such as fabric, mentioned in Chapter 6, Deploying the Code) that are really worth using but are not likely to be ported in the very near future.

Also, sometimes we may be constrained by the organization we work in. The existing legacy code may be so complex that porting it is not economically feasible. So, even if we decide to move on and live only in the Python 3 world from now on, it will be impossible to completely live without Python 2 for some time.

Nowadays, it is very hard to name oneself a professional developer without giving something back to the community, so helping the open source developers in adding Python 3 compatibility to the existing packages is a good way to pay off the "moral debt" incurred by using them. This, of course, cannot be done without knowing the differences between Python 2 and Python 3. By the way, this is also a great exercise for those new in Python 3.

The main syntax differences and common pitfalls

The Python documentation is the best reference for differences between every release. Anyway, for readers' convenience, this section summarizes the most important ones. This does not change the fact that the documentation is mandatory reading for those not familiar with Python 3 yet (see https://docs.python.org/3.0/whatsnew/3.0.html).

The breaking changes introduced by Python 3 can generally be divided into a few groups:

  • Syntax changes, wherein some syntax elements were removed/changed and other elements were added

  • Changes in the standard library

  • Changes in datatypes and collections

Syntax changes

Syntax changes that make it difficult for the existing code to run are the easiest to spot—they will cause the code to not run at all. The Python 3 code that features new syntax elements will fail to run on Python 2 and vice versa. The elements that are removed will make Python 2 code visibly incompatible with Python 3. The running code that has such issues will immediately cause the interpreter to fail raising a SyntaxError exception. Here is an example of the broken script that has exactly two statements, of which none will be executed due to the syntax error:

print("hello world")
print "goodbye python2"

Its actual result when run on Python 3 is as follows:

$ python3 script.py
  File "script.py", line 2
    print "goodbye python2"
                         ^
SyntaxError: Missing parentheses in call to 'print'

The list of such differences is a bit long and, from time to time, any new Python 3.x release may add new elements of syntax that will raise such errors on earlier releases of Python (even on the same 3.x branch). The most important of them are covered in Chapter 2, Syntax Best Practices – below the Class Level, and Chapter 3, Syntax Best Practices – above the Class Level, so there is no need to list all of them here.

The list of things dropped or changed from Python 2.7 is shorter, so here are the most important ones:

  • print is no longer a statement but a function instead, so the parenthesis is now obligatory.

  • Catching exceptions changed from except exc, var to except exc as var.

  • The <> comparison operator has been removed in favor of !=.

  • from module import * (https://docs.python.org/3.0/reference/simple_stmts.html#import) is now allowed only on a module level, no longer inside the functions.

  • from .[module] import name is now the only accepted syntax for relative imports. All imports not starting with the dot character are interpreted as absolute imports.

  • The sort() function and the list's sorted() method no longer accept the cmp argument. The key argument should be used instead.

  • Division expressions on integers such as 1/2 return floats. The truncating behavior is achieved through the // operator like 1//2. The good thing is that this can be used with floats too, so 5.0//2.0 == 2.0.

Changes in the standard library

Breaking changes in the standard library are the second easiest to catch after syntax changes. Each subsequent version of Python adds, deprecates, improves, or completely removes standard library modules. Such a process was regular also in the older versions of Python (1.x and 2.x), so it does not come as a shock in Python 3. In most cases, depending on the module that was removed or reorganized (like urlparse being moved to urllib.parse), it will raise exceptions on the import time just after it was interpreted. This makes such issues so easy to catch. Anyway, in order to be sure that all such issues are covered, the full test code coverage is essential. In some cases (for example, when using lazily loaded modules), the issues that are usually noticed on import time will not appear before some modules are used in code as function calls. This is why, it is so important to make sure that every line of code is actually executed during tests suite.

Tip

Lazily loaded modules

A lazily loaded module is a module that is not loaded on import time. In Python, import statements can be included inside of functions so import will happen on a function call and not on import time. In some cases, such loading of modules may be a reasonable choice but in most cases, it is a workaround for poorly designed module structures (for example, to avoid circular imports) and should be generally avoided. For sure, there is no justifiable reason to lazily load standard library modules.

Changes in datatypes and collections

Changes in how Python represents datatypes and collections require the most effort when the developer tries to maintain compatibility or simply port existing code to Python 3. While incompatible syntax or standard library changes are easily noticeable and the most easy to fix, changes in collections and types are either nonobvious or require a lot of repetitive work. A list of such changes is long and, again, official documentation is the best reference.

Still, this section must cover the change in how string literals are treated in Python 3 because it seems to be the most controversial and discussed change in Python 3, despite being a very good thing that now makes things more explicit.

All string literals are now Unicode and bytes literals require a b or B prefix. For Python 3.0 and 3.1 using u prefix (like u"foo") was dropped and will raise a syntax error. Dropping that prefix was the main reason for all controversies. It made really hard to create code that was compatible in different branches of Python—version 2.x relied on this prefix in order to create Unicode literals. This prefix was brought back in Python 3.3 to ease the integration process, although without any syntactic meaning.

The popular tools and techniques used for maintaining cross-version compatibility

Maintaining compatibility between versions of Python is a challenge. It may add a lot of additional work depending on the size of the project but is definitely doable and worth doing. For packages that are meant to be reused in many environments, it is an absolute must have. Open source packages without well-defined and tested compatibility bounds are very unlikely to become popular, but also, closed third-party code that never leaves the company network can greatly benefit from being tested in different environments.

It should be noted here that while this part focuses mainly on compatibility between various versions of Python, these approaches apply for maintaining compatibility with external dependencies like different package versions, binary libraries, systems, or external services.

The whole process can be divided into three main areas, ordered by importance:

  • Defining and documenting target compatibility bounds and how they will be managed

  • Testing in every environment and with every dependency version declared as compatible

  • Implementing actual compatibility code

Declaration of what is considered compatible is the most important part of the whole process because it gives the users of the code (developers) the ability to have expectations and make assumptions on how it works and how it can change in the future. Our code can be used as a dependency in different projects that may also strive to manage compatibility, so the ability to reason how it behaves is crucial.

While this book tries to always give a few choices rather than to give an absolute recommendation on specific options, here is one of the few exceptions. The best way so far to define how compatibility may change in the future is by the proper approach to versioning numbers using Semantic Versioning (http://semver.org/), or shortly, semver. It describes a broadly accepted standard for marking the scope of change in code by the version specifier consisting only of three numbers. It also gives some advice on how to handle deprecation policies. Here is an excerpt from its summary:

Given a version number MAJOR.MINOR.PATCH, increment:

  • A MAJOR version when you make incompatible API changes

  • A MINOR version when you add functionality in a backwards-compatible manner

  • A PATCH version when you make backwards-compatible bug fixes

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

When it comes to testing, the sad truth is that to be sure that code is compatible with every declared dependency version and in every environment (here, the Python version), it must be tested in every combination of these. This, of course, may not be possible when the project has a lot of dependencies because the number of combinations grows rapidly with every new dependency in a version. So, typically some trade off needs to be made so that running full compatibility tests does not take ages. A selection of tools that help testing in so-called matrixes is presented in Chapter 10, Test-Driven Development, that discusses testing in general.

Note

The benefit of using projects that follow semver is that usually what needs to be tested are only major releases because minor and patch releases are guaranteed not to include backwards incompatible changes. This is only true if such projects can be trusted not to break such a contract. Unfortunately, mistakes happen to everyone and backward incompatible changes happen in a lot of projects, even on patch versions. Still, since semver declares strict compatibility on minor and patch version changes, breaking it is considered a bug, so it may be fixed in patch release.

Implementation of the compatibility layer is last and also least important if bounds of that compatibility are well-defined and rigorously tested. Still there are some tools and techniques that every programmer interested in such a topic should know.

The most basic is Python's __future__ module. It ports back some features from newer Python releases back into the older ones and takes the form of import statement:

from __future__ import <feature>

Features provided by future statements are syntax-related elements that cannot be easily handled by different means. This statement affects only the module where it was used. Here is an example of Python 2.7 interactive session that brings Unicode literals from Python 3.0:

Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> type("foo")  # old literals
<type 'str'>
>>> from __future__ import unicode_literals
>>> type("foo")  # now is unicode
<type 'unicode'>

Here is a list of all the available __future__ statement options that developers concerned with 2/3 compatibility should know:

  • division: This adds a Python 3 division operator (PEP 238)

  • absolute_import: This makes every form of import statement not starting with a dot character interpreted as an absolute import (PEP 328)

  • print_function: This changes a print statement into a function call, so parentheses around print becomes mandatory (PEP 3112)

  • unicode_literals: This makes every string literal interpreted as Unicode literals (PEP 3112)

A list of the __future__ statement options is very short and it covers only a few syntax features. The other things that have changed like the metaclass syntax (which is an advanced feature covered in Chapter 3, Syntax Best Practices – above the Class Level), are a lot harder to maintain. Reliably handling of multiple standard library reorganizations also cannot be solved by future statements. Happily, there are some tools that aim to provide a consistent layer of ready-to-use compatibility. The most commonly known is Six (https://pypi.python.org/pypi/six/) that provides whole common 2/3 compatibility boilerplate as a single module. The other promising but slightly less popular tool is the future module (http://python-future.org/).

In some situations, developers may not want to include additional dependencies in some small packages. A common practice is the additional module that gathers all the compatibility code, usually named compat.py. Here is an example of such a compat module taken from the python-gmaps project (https://github.com/swistakm/python-gmaps):

# -*- coding: utf-8 -*-
import sys

if sys.version_info < (3, 0, 0):
    import urlparse  # noqa

    def is_string(s):
        return isinstance(s, basestring)

else:
    from urllib import parse as urlparse  # noqa

    def is_string(s):
        return isinstance(s, str)

Such a compat.py module is popular even in projects that depends on Six for 2/3 compatibility because it is a very convenient way to store code that handles compatibility with different versions of packages used as dependencies.

Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password.

  • Hover the mouse pointer on the SUPPORT tab at the top.

  • Click on Code Downloads & Errata.

  • Enter the name of the book in the Search box.

  • Select the book for which you're looking to download the code files.

  • Choose from the drop-down menu where you purchased this book from.

  • Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Expert-Python-Programming_Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!