Book Image

Distributed Computing with Python

Book Image

Distributed Computing with Python

Overview of this book

CPU-intensive data processing tasks have become crucial considering the complexity of the various big data applications that are used today. Reducing the CPU utilization per process is very important to improve the overall speed of applications. This book will teach you how to perform parallel execution of computations by distributing them across multiple processors in a single machine, thus improving the overall performance of a big data processing task. We will cover synchronous and asynchronous models, shared memory and file systems, communication between various processes, synchronization, and more.
Table of Contents (15 chapters)
Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Summary


We saw how we can run our Python code on an HPC cluster using a job scheduler such as HTCondor or PBS.

Many aspects were not covered in the chapter due to space constraints. Probably, the most notable is MPI (Message Passing Interface), which is the main interprocess communication library standard for HPC jobs. Python has bindings for MPI, and probably the most commonly used one is mpi4py, which is available at http://pythonhosted.org/mpi4py/ and on the Python Package Index (https://pypi.python.org/pypi/mpi4py/).

Another topic that did not fit in the chapter is the ability to run distributed task queues on an HPC cluster. For those types of applications, one could submit a series of jobs to the cluster; one job would start the message broker, some other jobs could start the workers, and finally, one last job could start the application itself. Particular care should be paid to connecting workers and applications to the broker that will be running on a machine not known at submission...