Book Image

Python Parallel Programming Cookbook - Second Edition

By : Giancarlo Zaccone
Book Image

Python Parallel Programming Cookbook - Second Edition

By: Giancarlo Zaccone

Overview of this book

<p>Nowadays, it has become extremely important for programmers to understand the link between the software and the parallel nature of their hardware so that their programs run efficiently on computer architectures. Applications based on parallel programming are fast, robust, and easily scalable. </p><p> </p><p>This updated edition features cutting-edge techniques for building effective concurrent applications in Python 3.7. The book introduces parallel programming architectures and covers the fundamental recipes for thread-based and process-based parallelism. You'll learn about mutex, semaphores, locks, queues exploiting the threading, and multiprocessing modules, all of which are basic tools to build parallel applications. Recipes on MPI programming will help you to synchronize processes using the fundamental message passing techniques with mpi4py. Furthermore, you'll get to grips with asynchronous programming and how to use the power of the GPU with PyCUDA and PyOpenCL frameworks. Finally, you'll explore how to design distributed computing systems with Celery and architect Python apps on the cloud using PythonAnywhere, Docker, and serverless applications. </p><p> </p><p>By the end of this book, you will be confident in building concurrent and high-performing applications in Python.</p>
Table of Contents (16 chapters)
Title Page

Flynn's taxonomy

Flynn's taxonomy is a system for classifying computer architectures. It is based on two main concepts:

  • Instruction flow: A system with n CPU has n program counters and, therefore, instructions flows. This corresponds to a program counter.
  • Data flow: A program that calculates a function on a list of data has a data flow. The program that calculates the same function on several different lists of data has more data flows. This is made up of a set of operands.

As the instruction and data flows are independent, there are four categories of parallel machines: Single Instruction Single Data (SISD), Single Instruction Multiple Data (SIMD), Multiple Instruction Single Data (MISD), and Multiple Instruction Multiple Data (MIMD):

Flynn's taxonomy

Single Instruction Single Data (SISD)

The SISD computing system is like the von Neumann machine, which is a uniprocessor machine. As you can see in Flynn's taxonomy diagram, it executes a single instruction that operates on a single data stream. In SISD, machine instructions are processed sequentially.

In a clock cycle, the CPU executes the following operations:

  • Fetch: The CPU fetches the data and instructions from a memory area, which is called a register.
  • Decode: The CPU decodes the instructions.
  • Execute: The instruction is carried out on the data. The result of the operation is stored in another register.

Once the execution stage is complete, the CPU sets itself to begin another CPU cycle:

The fetch, decode, and execute cycle

The algorithms that run on this type of computer are sequential (or serial) since they do not contain any parallelism. An example of a SISD computer is a hardware system with a single CPU.

The main elements of these architectures (namely, von Neumann architectures) are as follows:

  • Central memory unit: This is used to store both instructions and program data.
  • CPU: This is used to get the instruction and/or data from the memory unit, which decodes the instructions and sequentially implements them.
  • The I/O system: This refers to the input and output data of the program.

Conventional single-processor computers are classified as SISD systems:

The SISD architecture schema


The following diagram specifically shows which areas of a CPU are used in the stages of fetch, decode, and execute:

CPU components in the fetch-decode-execute phase

Multiple Instruction Single Data (MISD)

In this model, n processors, each with their own control unit, share a single memory unit. In each clock cycle, the data received from the memory is processed by all processors simultaneously, each in accordance with the instructions received from its control unit.

In this case, the parallelism (instruction-level parallelism) is obtained by performing several operations on the same piece of data. The types of problems that can be solved efficiently in these architectures are rather special, such as data encryption. For this reason, the MISD computer has not found space in the commercial sector. MISD computers are more of an intellectual exercise than a practical configuration.

Single Instruction Multiple Data (SIMD)

A SIMD computer consists of n identical processors, each with their own local memory, where it is possible to store data. All processors work under the control of a single instruction stream. In addition to this, there are n data streams, one for each processor. The processors work simultaneously on each step and execute the same instructions, but on different data elements. This is an example of data-level parallelism.

The SIMD architectures are much more versatile than MISD architectures. Numerous problems covering a wide range of applications can be solved by parallel algorithms on SIMD computers. Another interesting feature is that the algorithms for these computers are relatively easy to design, analyze, and implement. The limitation is that only the problems that can be divided into a number of subproblems (which are all identical, each of which will then be solved simultaneously through the same set of instructions) can be addressed with the SIMD computer.

With the supercomputer developed according to this paradigm, we must mention the Connection Machine (Thinking Machine, 1985) and MPP (NASA, 1983).

As we will see in Chapter 6Distributed Python, and Chapter 7, Cloud Computing, the advent of modern graphics cards (GPUs), built with many SIMD-embedded units, has led to the more widespread use of this computational paradigm.

Multiple Instruction Multiple Data (MIMD)

This class of parallel computers is the most general and most powerful class, according to Flynn's classification. This contains n processors, n instruction streams, and n data streams. Each processor has its own control unit and local memory, which makes MIMD architectures more computationally powerful than SIMD architectures.

Each processor operates under the control of a flow of instructions issued by its own control unit. Therefore, the processors can potentially run different programs with different data, which allows them to solve subproblems that are different and can be a part of a single larger problem. In MIMD, the architecture is achieved with the help of the parallelism level with threads and/or processes. This also means that the processors usually operate asynchronously. 

Nowadays, this architecture is applied to many PCs, supercomputers, and computer networks. However, there is a counter that you need to consider: asynchronous algorithms are difficult to design, analyze, and implement:

The SIMD architecture (A) and the MIMD architecture (B)

Flynn's taxonomy can be extended by considering that SIMD machines can be divided into two subgroups:

  • Numerical supercomputers
  • Vectorial machines

On the other hand, MIMD can be divided into machines that have a shared memory and those that have a distributed memory.

Indeed the next section focuses on this last aspect of the organization of the memory of MIMD machines.