Book Image

Parallel Programming with Python

Book Image

Parallel Programming with Python

Overview of this book

Table of Contents (16 chapters)
Parallel Programming with Python
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Crawling the Web using the concurrent.futures module


The following section will make use of our code by implementing the parallel Web crawler. In this scheme, we will use a very interesting Python resource, ThreadPoolExecutor, which is featured in the concurrent.futures module. In the previous example, in which we analyzed parallel_fibonacci.py, quite primitive forms of threads were used. Also, at a specific moment, we had to create and initialize more than one thread manually. In larger programs, it is very difficult to manage this kind of situation. In such case, there are mechanisms that allow a thread pool. A thread pool is nothing but a structure that keeps several threads, which are previously created, to be used in a certain process. It aims to reuse threads, thus avoiding unnecessary creation of threads—which is costly.

Basically, as mentioned in the previous chapter, we will have an algorithm that will execute some tasks in stages, and these tasks depend on each other. Here, we will...