Book Image

Python Parallel Programming Cookbook

By : Giancarlo Zaccone
Book Image

Python Parallel Programming Cookbook

By: Giancarlo Zaccone

Overview of this book

This book will teach you parallel programming techniques using examples in Python and will help you explore the many ways in which you can write code that allows more than one process to happen at once. Starting with introducing you to the world of parallel computing, it moves on to cover the fundamentals in Python. This is followed by exploring the thread-based parallelism model using the Python threading module by synchronizing threads and using locks, mutex, semaphores queues, GIL, and the thread pool. Next you will be taught about process-based parallelism where you will synchronize processes using message passing along with learning about the performance of MPI Python Modules. You will then go on to learn the asynchronous parallel programming model using the Python asyncio module along with handling exceptions. Moving on, you will discover distributed computing with Python, and learn how to install a broker, use Celery Python Module, and create a worker. You will understand anche Pycsp, the Scoop framework, and disk modules in Python. Further on, you will learnGPU programming withPython using the PyCUDA module along with evaluating performance limitations.
Table of Contents (13 chapters)
Python Parallel Programming Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using MapReduce with Disco


Disco is a Python module based on the MapReduce framework introduced by Google, which allows the management of large distributed data in computer clusters. The applications written using Disco can be performed in the economic cluster of machines with a very short learning curve. In fact, the technical difficulties related to the processes that are distributed as load balancing, job scheduling, and the communications protocol are completely managed by Disco and hidden from the developer.

The typical applications of this module are essentially as follows:

  • Web indexing

  • URL access counter

  • Distributed sort

    The MapReduce schema

The MapReduce algorithm implemented in Disco is as follows:

  • Map: The master node takes the input data, breaks it into smaller subtasks, and distributes the work to the slave nodes. The single map node produces the intermediate result of the map() function in the form of pairs [key, value] stored on a distributed file whose location is given to the master...