Now that we have executed the codes in parallel using PP to dispatch the local processes, it is time to verify that the code is executed in a distributed way. For this, we will use the following three different machines:
The idea is to dispatch the executions to the three machines listed using PP. For this, we will make use of a case study of the Web crawler. In the code of web_crawler_pp_cluster.py
, for each URL informed in the input_list
, we will dispatch a local or remote process for execution, and at the end of each execution, a callback
function will group the URLs and their first three links found.
Let us analyze the code step by step to understand how to get to a solution to this problem. First, we will import the necessary modules and define the data structures to be used. As in the previous section, we will create an input_list
and a dictionary...