Book Image

Distributed Computing with Python

Book Image

Distributed Computing with Python

Overview of this book

CPU-intensive data processing tasks have become crucial considering the complexity of the various big data applications that are used today. Reducing the CPU utilization per process is very important to improve the overall speed of applications. This book will teach you how to perform parallel execution of computations by distributing them across multiple processors in a single machine, thus improving the overall performance of a big data processing task. We will cover synchronous and asynchronous models, shared memory and file systems, communication between various processes, synchronization, and more.
Table of Contents (15 chapters)
Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Installing Celery


Celery (http://www.celeryproject.org) is the first third-party library that we encounter in this book, since so far, we have only looked at modules and packages in the Python standard library. Celery is a distributed task queue, meaning that it is a queue-based system like some of the ones that we built in the previous chapters. It is also distributed, which means that worker processes, as well as the queues holding results and work requests, typically run on different machines.

Let's start by installing Celery and its dependencies. We start by setting up a virtual environment on each machine (let's call it book so that we know it is related to the examples in this book), as shown in the following line of code (assuming a Unix environment):

$ pip install virtualenvwrapper

If the preceding command fails with a permission denied error, then you can use sudo to install virtualenvwrapper as a super-user, as shown in the following command:

$ sudo pip install virtualenvwrapper...