Book Image

Distributed Computing with Python

Book Image

Distributed Computing with Python

Overview of this book

CPU-intensive data processing tasks have become crucial considering the complexity of the various big data applications that are used today. Reducing the CPU utilization per process is very important to improve the overall speed of applications. This book will teach you how to perform parallel execution of computations by distributing them across multiple processors in a single machine, thus improving the overall performance of a big data processing task. We will cover synchronous and asynchronous models, shared memory and file systems, communication between various processes, synchronization, and more.
Table of Contents (15 chapters)
Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

The tools


In Chapter 3, Parallelism in Python, we looked at a few standard library modules that can be used to introduce (single-node) parallelism in our applications. We experimented with both the threading and multiprocessing modules directly and via the higher-level concurrent.futures module.

We saw how, for non-distributed, parallel applications, Python offers a really robust foundation. The preceding three modules are complete and included in every modern Python distribution. They have no external dependencies, which makes them quite appealing.

We explored a few third-party Python modules for simple distributed computing in Chapter 4, Distributed Applications – with Celery. These included Celery, Python-RQ, and Pyro. We saw how to use them in our code, and mostly, we saw how simple it is to get up and running with each one of them.

They all require some pieces of infrastructure such as message brokers, databases, or name servers, and so they might or might not be of use in all contexts...