Python High Performance Programming

Python High Performance Programming

By : Dr. Gabriele Lanaro

Buy this Book

Python High Performance Programming

By: Dr. Gabriele Lanaro

Buy this Book

Overview of this book

Python is a programming language with a vibrant community known for its simplicity, code readability, and expressiveness. The massive selection of third party libraries make it suitable for a wide range of applications. This also allows programmers to express concepts in fewer lines of code than would be possible in similar languages. The availability of high quality numerically-focused tools has made Python an excellent choice for high performance computing. The speed of applications comes down to how well the code is written. Poorly written code means poorly performing applications, which means unsatisfied customers. This book is an example-oriented guide to the techniques used to dramatically improve the performance of your Python programs. It will teach optimization techniques by using pure python tricks, high performance libraries, and the python-C integration. The book will also include a section on how to write and run parallel code. This book will teach you how to take any program and make it run much faster. You will learn state-of the art techniques by applying them to practical examples. This book will also guide you through different profiling tools which will help you identify performance issues in your program. You will learn how to speed up your numerical code using NumPy and Cython. The book will also introduce you to parallel programming so you can take advantage of modern multi-core processors. This is the perfect guide to help you achieve the best possible performance in your Python applications.

Python High Performance Programming

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Benchmarking and Profiling

Designing your application

Writing tests and benchmarks

Finding bottlenecks with cProfile

Profile line by line with line_profiler

Optimizing our code

The dis module

Profiling memory usage with memory_profiler

Performance tuning tips for pure Python code

Summary

Fast Array Operations with NumPy

Getting started with NumPy

Rewriting the particle simulator in NumPy

Reaching optimal performance with numexpr

Summary

C Performance with Cython

Compiling Cython extensions

Adding static types

Sharing declarations

Working with arrays

Particle simulator in Cython

Profiling Cython

Summary

Parallel Processing

Introduction to parallel programming

The multiprocessing module

IPython parallel

Parallel Cython with OpenMP

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Writing tests and benchmarks

Now that we have a working simulator, we can start measuring our performance and tuning-up our code, so that our simulator can handle as many particles as possible. The first step in this process is to write a test and a benchmark.

We need a test that checks whether the results produced by the simulation are correct or not. In the optimization process we will rewrite the code to try different solutions; by doing so we may easily introduce bugs. Maintaining a solid test suite is essential to avoid wasting time on broken code.

Our test will take three particle and let the system evolve for 0.1 time units. We then compare our results, up to a certain precision, with those from a reference implementation:

def test():
    particles = [Particle( 0.3,  0.5, +1),
                 Particle( 0.0, -0.5, -1),
                 Particle(-0.1, -0.4, +3)]

    simulator = ParticleSimulator(particles)

    simulator.evolve(0.1)

    p0, p1, p2 = particles

    def fequal(a, b):
        return abs(a - b) < 1e-5

    assert fequal(p0.x, 0.2102698450356825)
    assert fequal(p0.y, 0.5438635787296997)

    assert fequal(p1.x, -0.0993347660567358)
    assert fequal(p1.y, -0.4900342888538049)

    assert fequal(p2.x,  0.1913585038252641)
    assert fequal(p2.y, -0.3652272210744360)

if __name__ == '__main__':
    test()

We also want to write a benchmark that can measure the performance of our application. This will provide an indication of how much we have improved over the previous implementation.

In our benchmark we instantiate 100 Particle objects with random coordinates and angular velocity, and feed them to a ParticleSimulator class. We then let the system evolve for 0.1 time units:

from random import uniform

def benchmark():
    particles = [Particle(uniform(-1.0, 1.0),
                          uniform(-1.0, 1.0),
                          uniform(-1.0, 1.0))
                  for i in range(1000)]
    
    simulator = ParticleSimulator(particles)
    simulator.evolve(0.1)

if __name__ == '__main__':
    benchmark()

Timing your benchmark

You can easily measure the execution time of any process from the command line by using the Unix time command:

$ time python simul.py
real    0m1.051s
user    0m1.022s
sys    0m0.028s

Note

The time command is not available for Windows, but can be found in the cygwin shell that you can download from the official website http://www.cygwin.com/.

By default, time shows three metrics:

real: The actual time spent in running the process from start to finish, as if it was measured by a human with a stopwatch
user: The cumulative time spent by all the CPUs during the computation
sys: The cumulative time spent by all the CPUs during system-related tasks such as memory allocation

Notice that sometimes user + sys might be greater than real, as multiple processors may work in parallel.

Tip

time also offers several formatting options; for an overview you can explore its manual (by using the man time command). If you want a summary of all the metrics available, you can use the -v option.

The Unix time command is a good way to benchmark your program. To achieve a more accurate measurement, the benchmark should run long enough (in the order of seconds) so that the setup and tear-down of the process become small, compared to the execution time. The user metric is suitable as a monitor for the CPU performance, as the real metric includes also the time spent in other processes or waiting for I/O operations.

Another useful program to time Python scripts is the timeit module. This module runs a snippet of code in a loop for n times and measures the time taken. Then, it repeats this operation r times (by default the value of r is 3) and takes the best of those runs. Because of this procedure, timeit is suitable to accurately time small statements in isolation.

The timeit module can be used as a Python module, from the command line, or from IPython.

IPython is a Python shell designed for interactive usage. It boosts tab completion and many utilities to time, profile, and debug your code. We will make use of this shell to try out snippets throughout the book. The IPython shell accepts magic commands—statements that start with a % symbol—that enhance the shell with special behaviors. Commands that start with %% are called cell magics, and these commands can be applied on multi-line snippets (called cells).

IPython is available on most Linux distributions and is included in Anaconda. You can follow the installation instructions in the official documentation at:

http://ipython.org/install.html

Tip

You can use IPython as a regular Python shell (ipython) but it is also available in a Qt-based version (ipython qtconsole) and as a powerful browser-based interface (ipython notebook).

In IPython and command line interfaces it is possible to specify the number of loops or repetitions with the options -n and -r, otherwise they will be determined automatically. When invoking timeit from the command line, you can also give a setup code that will run before executing the statement in a loop.

In the following code we show how to use timeit from IPython, from the command line and as a Python module:

# IPython Interface
$ ipython
In [1]: from simul import benchmark
In [2]: %timeit benchmark()
1 loops, best of 3: 782 ms per loop

# Command Line Interface
$ python -m timeit -s 'from simul import benchmark' 'benchmark()'10 loops, best of 3: 826 msec per loop

# Python Interface
# put this function into the simul.py script

import timeit
result = timeit.timeit('benchmark()',
                                   setup='from __main__ import benchmark', number=10)
# result is the time (in seconds) to run the whole loop

result = timeit.repeat('benchmark()', setup='from __main__ import benchmark', number=10, repeat=3)
# result is a list containing the time of each repetition (repeat=3 in this case)

Notice that while the command line and IPython interfaces are automatically determining a reasonable value for n, the Python interface requires you to explicitly pass it as the number argument.

Python High Performance Programming

By : Dr. Gabriele Lanaro

Python High Performance Programming

By: Dr. Gabriele Lanaro

Overview of this book

Related Content you might be interested in

Current Title:

Python High Performance Programming

Writing tests and benchmarks

Timing your benchmark

Note

Tip

Tip