Book Image

Python Parallel Programming Cookbook - Second Edition

By : Giancarlo Zaccone
Book Image

Python Parallel Programming Cookbook - Second Edition

By: Giancarlo Zaccone

Overview of this book

<p>Nowadays, it has become extremely important for programmers to understand the link between the software and the parallel nature of their hardware so that their programs run efficiently on computer architectures. Applications based on parallel programming are fast, robust, and easily scalable. </p><p> </p><p>This updated edition features cutting-edge techniques for building effective concurrent applications in Python 3.7. The book introduces parallel programming architectures and covers the fundamental recipes for thread-based and process-based parallelism. You'll learn about mutex, semaphores, locks, queues exploiting the threading, and multiprocessing modules, all of which are basic tools to build parallel applications. Recipes on MPI programming will help you to synchronize processes using the fundamental message passing techniques with mpi4py. Furthermore, you'll get to grips with asynchronous programming and how to use the power of the GPU with PyCUDA and PyOpenCL frameworks. Finally, you'll explore how to design distributed computing systems with Celery and architect Python apps on the cloud using PythonAnywhere, Docker, and serverless applications. </p><p> </p><p>By the end of this book, you will be confident in building concurrent and high-performing applications in Python.</p>
Table of Contents (16 chapters)
Title Page

Evaluating the performance of a parallel program

The development of parallel programming created the need for performance metrics in order to decide whether its use is convenient or not. Indeed, the focus of parallel computing is to solve large problems in a relatively short period of time. The factors contributing to this objective are, for example, the type of hardware used, the degree of parallelism of the problem, and the parallel programming model adopted. To facilitate this, the analysis of basic concepts was introduced, which compares the parallel algorithm obtained from the original sequence.

The performance is achieved by analyzing and quantifying the number of threads and/or the number of processes used. To analyze this, let's introduce a few performance indexes:

  • Speedup
  • Efficiency
  • Scaling

The limitations of parallel computation are introduced by Amdahl's law. To evaluate the degree of efficiency of the parallelization of a sequential algorithm, we have Gustafson's law.


The speedup is the measure that displays the benefit of solving a problem in parallel. It is defined as the ratio of the time taken to solve a problem on a single processing element (Ts) to the time required to solve the same problem on p identical processing elements (Tp).

We denote speedup as follows:


We have a linear speedup, where if S=p, then it means that the speed of execution increases with the number of processors. Of course, this is an ideal case. While the speedup is absolute when Ts is the execution time of the best sequential algorithm, the speedup is relative when Ts is the execution time of the parallel algorithm for a single processor.

Let's recap these conditions:

  • S = p is a linear or ideal speedup.
  • S < p is a real speedup.
  • S > p is a superlinear speedup.


In an ideal world, a parallel system with p processing elements can give us a speedup that is equal to p. However, this is very rarely achieved. Usually, some time is wasted in either idling or communicating. Efficiency is a measure of how much of the execution time a processing element puts toward doing useful work, given as a fraction of the time spent.

We denote it by E and can define it as follows:


The algorithms with linear speedup have a value of E = 1. In other cases, they have the value of E is less than 1. The three cases are identified as follows:

  • When E = 1, it is a linear case.
  • When E < 1, it is a real case.
  • When E << 1, it is a problem that is parallelizable with low efficiency.


Scaling is defined as the ability to be efficient on a parallel machine. It identifies the computing power (speed of execution) in proportion to the number of processors. By increasing the size of the problem and, at the same time, the number of processors, there will be no loss in terms of performance.

The scalable system, depending on the increments of the different factors, may maintain the same efficiency or improve it.

Amdahl's law

Amdahl's law is a widely used law that is used to design processors and parallel algorithms. It states that the maximum speedup that can be achieved is limited by the serial component of the program:


1 – P denotes the serial component (not parallelized) of a program.

This means that, for example, if a program in which 90% of the code can be made parallel, but 10% must remain serial, then the maximum achievable speedup is 9, even for an infinite number of processors.

Gustafson's law

Gustafson's law states the following:

Here, as we indicated in the equation the following applies:

  • P is the number of processors.
  • S is the speedup factor.
  • α is the non-parallelizable fraction of any parallel process.

Gustafson's law is in contrast to Amdahl's law, which, as we described, assumes that the overall workload of a program does not change with respect to the number of processors.

In fact, Gustafson's law suggests that programmers first set the time allowed for solving a problem in parallel and then based on that (that is time) to size the problem. Therefore, the faster the parallel system is, the greater the problems that can be solved over the same period of time. 

The effect of Gustafson's law was to direct the objectives of computer research towards the selection or reformulation of problems in such a way that the solution of a larger problem would still be possible in the same amount of time. Furthermore, this law redefines the concept of efficiency as a need to reduce at least the sequential part of a program, despite the increase in workload.