Book Image

Expert Python Programming - Second Edition

By : Michał Jaworski
Book Image

Expert Python Programming - Second Edition

By: Michał Jaworski

Overview of this book

Python is a dynamic programming language, used in a wide range of domains by programmers who find it simple, yet powerful. Even if you find writing Python code easy, writing code that is efficient and easy to maintain and reuse is a challenge. The focus of the book is to familiarize you with common conventions, best practices, useful tools and standards used by python professionals on a daily basis when working with code. You will begin with knowing new features in Python 3.5 and quick tricks for improving productivity. Next, you will learn advanced and useful python syntax elements brought to this new version. Using advanced object-oriented concepts and mechanisms available in python, you will learn different approaches to implement metaprogramming. You will learn to choose good names, write packages, and create standalone executables easily. You will also be using some powerful tools such as buildout and vitualenv to release and deploy the code on remote servers for production use. Moving on, you will learn to effectively create Python extensions with C, C++, cython, and pyrex. The important factors while writing code such as code management tools, writing clear documentation, and test-driven development are also covered. You will now dive deeper to make your code efficient with general rules of optimization, strategies for finding bottlenecks, and selected tools for application optimization. By the end of the book, you will be an expert in writing efficient and maintainable code.
Table of Contents (21 chapters)
Expert Python Programming Second Edition
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Application-level isolation of Python environments


Nowadays, a lot of operating systems come with Python as a standard component. Most Linux distributions and Unix-based systems such as FreeBSD, NetBSD, OpenBSD, or OS X come with Python are either installed by default or available through system package repositories. Many of them even use it as part of some core components—Python powers the installers of Ubuntu (Ubiquity), Red Hat Linux (Anaconda), and Fedora (Anaconda again).

Due to this fact, a lot of packages from PyPI are also available as native packages managed by the system's package management tools such as apt-get (Debian, Ubuntu), rpm (Red Hat Linux), or emerge (Gentoo). Although it should be remembered that the list of available libraries is very limited and they are mostly outdated when compared to PyPI. This is the reason why pip should always be used to obtain new packages in the latest version as a recommendation of PyPA (Python Packaging Authority). Although it is an independent package starting from version 2.7.9 and 3.4 of CPython, it is bundled with every new release by default. Installing the new package is as simple as this:

pip install <package-name>

Among other features, pip allows forcing specific versions of packages (using the pip install package-name==version syntax) and upgrading to the latest version available (using the ––upgrade switch). The full usage description for most of the command-line tools presented in the book can be easily obtained simply by running the command with the -h or --help switch, but here is an example session that demonstrates the most commonly used options:

$ pip show pip
---
Metadata-Version: 2.0
Name: pip
Version: 7.1.2
Summary: The PyPA recommended tool for installing Python packages.
Home-page: https://pip.pypa.io/
Author: The pip developers
Author-email: [email protected]
License: MIT
Location: /usr/lib/python2.7/site-packages
Requires:

$ pip install 'pip<7.0.0'
Collecting pip<7.0.0
  Downloading pip-6.1.1-py2.py3-none-any.whl (1.1MB)
    100% |████████████████████████████████| 1.1MB 242kB/s
Installing collected packages: pip
  Found existing installation: pip 7.1.2
    Uninstalling pip-7.1.2:
      Successfully uninstalled pip-7.1.2
Successfully installed pip-6.1.1
You are using pip version 6.1.1, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

$ pip install --upgrade pip
You are using pip version 6.1.1, however version 7.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
Collecting pip
  Using cached pip-7.1.2-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 6.1.1
    Uninstalling pip-6.1.1:
      Successfully uninstalled pip-6.1.1
Successfully installed pip-7.1.2

In some cases, pip may not be available by default. From Python 3.4 version onwards (and also Python 2.7.9), it can always be bootstrapped using the ensurepip module:

$ python -m ensurepip
Ignoring indexes: https://pypi.python.org/simple
Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/site-packages
Collecting pip
Installing collected packages: pip
Successfully installed pip-6.1.1

The most up-to-date information on how to install pip for older Python versions is available on the project's documentation page at https://pip.pypa.io/en/stable/installing/.

Why isolation?

pip may be used to install system-wide packages. On Unix-based and Linux systems, this will require super user privileges, so the actual invocation will be:

sudo pip install <package-name>

Note that this is not required on Windows since it does not provide the Python interpreter by default, and Python on Windows is usually installed manually by the user without super user privileges.

Anyway, installing system-wide packages directly from PyPI is not recommended and should be avoided. This may seem like a contradiction with the previous statement that using pip is a PyPA recommendation, but there are some serious reasons for that. As explained earlier, Python is very often an important part of many packages available through operating system package repositories and may power a lot of important services. System distribution maintainers put a lot of effort in selecting the correct versions of packages to match various package dependencies. Very often, Python packages that are available from system's package repositories contain custom patches or are kept outdated only to ensure compatibility with some other system components. Forcing an update of such a package using pip to a version that breaks some backwards compatibility might break some crucial system services.

Doing such things only on the local computer for development purposes is also not a good excuse. Recklessly using pip that way is almost always asking for trouble and will eventually lead to issues that are very hard to debug. This does not mean that installing packages from PyPI globally is a strictly forbidden thing, but it should always be done consciously and while knowing the related risks.

Fortunately, there is an easy solution to this problem—environment isolation. There are various tools that allow the isolation of the Python runtime environment at different levels of system abstraction. The main idea is to isolate project dependencies from packages required by different projects and/or system services. The benefits of this approach are:

  • It solves the "Project X depends on version 1.x but Project Y needs 4.x" dilemma. The developer can work on multiple projects with different dependencies that may even collide without the risk of affecting each other.

  • The project is no longer constrained by versions of packages that are provided in his system distribution repositories.

  • There is no risk of breaking other system services that depend on certain package versions because new package versions are only available inside such an environment.

  • A list of packages that are project dependencies can be easily "frozen", so it is very easy to reproduce them.

The easiest and most lightweight approach to isolation is to use application-level virtual environments. They focus only on isolating the Python interpreter and packages available in it. They are very easy to set up and are very often just enough to ensure proper isolation during the development of small projects and packages.

Unfortunately, in some cases, this may not be enough to ensure enough consistency and reproducibility. For such cases, system-level isolation is a good addition to the workflow and some available solutions to that are explained later in this chapter.

Popular solutions

There are several ways to isolate Python at runtime. The simplest and most obvious, although hardest to maintain, is to manually change PATH and PYTHONPATH environment variables and/or move Python binary to a different place to affect the way it discovers available packages and change it to a custom place where we want to store our project's dependencies. Fortunately, there are several tools available that help in maintaining virtual environments and how installed packages are stored in the system. These are mainly: virtualenv, venv, and buildout. What they do under the hood is in fact the same as what we would do manually. The actual strategy depends on the specific tool implementation, but generally, they are more convenient to use and can provide additional benefits.

virtualenv

Virtualenv is by far the most popular tool in this list. Its name simply stands for Virtual Environment. It's not a part of the standard Python distribution, so it needs to be obtained using pip. It is one of the packages that is worth installing system-wide (using sudo on Linux and Unix-based systems).

Once it is installed, a new virtual environment is created using the following command:

virtualenv ENV

Here, ENV should be replaced by the desired name for the new environment. This will create a new ENV directory in the current working directory path. It will contain a few new directories inside:

  • bin/: This is where the new Python executable and scripts/executables provided by other packages are stored.

  • lib/ and include/: These directories contain the supporting library files for the new Python inside the virtual environment. The new packages will be installed in ENV/lib/pythonX.Y/site-packages/.

Once the new environment is created, it needs to be activated in the current shell session using Unix's source command:

source ENV/bin/activate

This changes the state of the current shell sessions by affecting its environment variables. In order to make the user aware that he has activated the virtual environment, it will change the shell prompt by appending the (ENV) string at its beginning. Here is an example session that creates a new environment and activates it to illustrate this:

$ virtualenv example
New python executable in example/bin/python
Installing setuptools, pip, wheel...done.
$ source example/bin/activate
(example)$ deactivate
$ 

The important thing to note about virtualenv is that it depends completely on its state stored on a filesystem. It does not provide any additional abilities to track what packages should be installed in it. These virtual environments are not portable and should not be moved to another system/machine. This means that the new virtual environment needs to be created from scratch for each new application deployment. Because of that, there is a good practice used by virtualenv users to store all project dependencies in the requirements.txt file (this is the naming convention), as shown in the following code:

# lines followed by hash (#) are treated as a comments

# strict version names are best for reproducibility
eventlet==0.17.4
graceful==0.1.1

# for projects that are well tested with different
# dependency versions the relative version specifiers 
# are acceptable too
falcon>=0.3.0,<0.5.0

# packages without versions should be avoided unless
# latest release is always required/desired
pytz

With such files, all dependencies can be easily installed using pip because it accepts the requirements file as its output:

pip install -r requirements.txt

What needs to be remembered is that the requirements file is not always the ideal solution because it does not define the exact list of dependencies, only those that are to be installed. So, the whole project can work without problems in a development environment but will fail to start in others if the requirements file is outdated and does not reflect actual state of environment. There is, of course, the pip freeze command that prints all packages in the current environment but it should not be used blindly—it will output everything, even packages that are not used in the project but installed only for testing. The other tool mentioned in the book, buildout, addresses this issue, so it may be a better choice for some development teams.

Note

For Windows users, virtualenv under Windows uses a different naming for its internal structure of directories. You need to use Scripts/, Libs/, and Include/ instead of bin/, lib/, include/, to better match development conventions on that operating system. The commands used for activating/deactivating the environment are also different; you need to use ENV/Scripts/activate.bat and ENV/Scripts/deactivate.bat instead of using source on activate and deactivate scripts.

venv

Virtual environments shortly became well established and a popular tool within the community. Starting from Python 3.3, creating virtual environments is supported by standard library. The usage is almost the same as with Virtualenv, although command-line options have quite a different naming convention. The new venv module provides a pyvenv script for creating a new virtual environment:

pyvenv ENV

Here, ENV should be replaced by the desired name for the new environment. Also, new environments can now be created directly from Python code because all functionality is exposed from the built-in venv module. The other usage and implementation details, like the structure of the environment directory and activate/deactivate scripts are mostly the same as in Virtualenv, so migration to this solution should be easy and painless.

For developers using newer versions of Python, it is recommended to use venv instead of Virtualenv. For Python 3.3, switching to venv may require more effort because in this version, it does not install setuptools and pip by default in the new environment, so the users need to install them manually. Fortunately, it has changed in Python 3.4, and also due to the customizability of venv, it is possible to override its behavior. The details are explained in the Python documentation (refer to https://docs.python.org/3.5/library/venv.html), but some users might find it too tricky and will stay with Virtualenv for that specific version of Python.

buildout

Buildout is a powerful tool for bootstrapping and the deployment of applications written in Python. Some of its advanced features will also be explained later in the book. For a long time, it was also used as a tool to create isolated Python environments. Because Buildout requires a declarative configuration that must be changed every time there is a change in dependencies, instead of relying on the environment state, these environments were easier to reproduce and manage.

Unfortunately, this has changed. The buildout package since version 2.0.0 no longer tries to provide any level of isolation from system Python installation. Isolation handling is left to other tools such as Virtualenv, so it is still possible to have isolated Buildouts, but things become a bit more complicated. A Buildout must be initialized inside an isolated environment in order to be really isolated.

This has a major drawback as compared to the previous versions of Buildout, since it depends on other solutions for isolation. The developer working on this code can no longer be sure whether the dependencies description is complete because some packages can be installed by bypassing the declarative configuration. This issue can of course be solved using proper testing and release procedures, but it adds some more complexity to the whole workflow.

To summarize, Buildout is no longer a solution that provides environment isolation but its declarative configuration can improve maintainability and the reproducibility of virtual environments.

Which one to choose?

There is no best solution that will fit every use case. What is good in one organization may not fit the workflow of other teams. Also, every application has different needs. Small projects can easily depend on sole virtualenv or venv but bigger ones may require additional help of buildout to perform more complex assembly.

What was not described in detail earlier is that previous versions of Buildout (buildout<2.0.0) allowed the assembly of projects in an isolated environment with similar results as provided by Virtualenv. Unfortunately, 1.x branch of this project is no longer maintained, so using it for that purpose is discouraged.

I would recommend to use venv module instead of Virtualenv whenever it is possible. So, this should be the default choice for projects targeting Python versions 3.4 and higher. Using venv in Python 3.3 may be a little inconvenient due to a lack of built-in support for setuptools and pip. For projects targeting a wider spectrum of Python run times (including alternative interpreters and 2.x branch), it seems that Virtualenv is the best choice.