Book Image

Parallel Programming with Python

By : Jan Palach, Jan Palach V Cruz da Silva
Book Image

Parallel Programming with Python

By: Jan Palach, Jan Palach V Cruz da Silva

Overview of this book

Table of Contents (16 chapters)
Parallel Programming with Python
About the Author
About the Reviewers

Crawling the Web using the concurrent.futures module

The following section will make use of our code by implementing the parallel Web crawler. In this scheme, we will use a very interesting Python resource, ThreadPoolExecutor, which is featured in the concurrent.futures module. In the previous example, in which we analyzed, quite primitive forms of threads were used. Also, at a specific moment, we had to create and initialize more than one thread manually. In larger programs, it is very difficult to manage this kind of situation. In such case, there are mechanisms that allow a thread pool. A thread pool is nothing but a structure that keeps several threads, which are previously created, to be used in a certain process. It aims to reuse threads, thus avoiding unnecessary creation of threads—which is costly.

Basically, as mentioned in the previous chapter, we will have an algorithm that will execute some tasks in stages, and these tasks depend on each other. Here, we will...