Book Image

Parallel Programming with Python

Book Image

Parallel Programming with Python

Overview of this book

Table of Contents (16 chapters)
Parallel Programming with Python
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Using PP to make a distributed Web crawler


Now that we have executed the codes in parallel using PP to dispatch the local processes, it is time to verify that the code is executed in a distributed way. For this, we will use the following three different machines:

  • Iceman-Thinkad-X220: Ubuntu 13.10

  • Iceman-Q47OC-500P4C: Ubuntu 12.04 LTS

  • Asgard-desktop: Elementary OS

The idea is to dispatch the executions to the three machines listed using PP. For this, we will make use of a case study of the Web crawler. In the code of web_crawler_pp_cluster.py, for each URL informed in the input_list, we will dispatch a local or remote process for execution, and at the end of each execution, a callback function will group the URLs and their first three links found.

Let us analyze the code step by step to understand how to get to a solution to this problem. First, we will import the necessary modules and define the data structures to be used. As in the previous section, we will create an input_list and a dictionary...