Parallel Programming with Python

By : Jan Palach, Jan Palach V Cruz da Silva
By: Jan Palach, Jan Palach V Cruz da Silva

Parallel Programming with Python
Using PP to make a distributed Web crawler

Now that we have executed the codes in parallel using PP to dispatch the local processes, it is time to verify that the code is executed in a distributed way. For this, we will use the following three different machines:

  • Iceman-Thinkad-X220: Ubuntu 13.10

  • Iceman-Q47OC-500P4C: Ubuntu 12.04 LTS

  • Asgard-desktop: Elementary OS

The idea is to dispatch the executions to the three machines listed using PP. For this, we will make use of a case study of the Web crawler. In the code of, for each URL informed in the input_list, we will dispatch a local or remote process for execution, and at the end of each execution, a callback function will group the URLs and their first three links found.

Let us analyze the code step by step to understand how to get to a solution to this problem. First, we will import the necessary modules and define the data structures to be used. As in the previous section, we will create an input_list and a dictionary...