Parallel Programming with Python

Book Image

Parallel Programming with Python

Book Image

Parallel Programming with Python

Overview of this book

Parallel Programming with Python

Parallel Programming with Python

Credits

About the Author

About the Author

Acknowledgments

Acknowledgments

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Contextualizing Parallel, Concurrent, and Distributed Programming

Contextualizing Parallel, Concurrent, and Distributed Programming

Why use parallel programming?

Exploring common forms of parallelization

Communicating in parallel programming

Identifying parallel programming problems

Discovering Python's parallel programming tools

Taking care of Python GIL

Designing Parallel Algorithms

Designing Parallel Algorithms

The divide and conquer technique

Using data decomposition

Decomposing tasks with pipeline

Processing and mapping

Identifying a Parallelizable Problem

Identifying a Parallelizable Problem

Obtaining the highest Fibonacci value for multiple inputs

Crawling the Web

Using the threading and concurrent.futures Modules

Using the threading and concurrent.futures Modules

Defining threads

Using threading to obtain the Fibonacci series term with multiple inputs

Crawling the Web using the concurrent.futures module

Using Multiprocessing and ProcessPoolExecutor

Using Multiprocessing and ProcessPoolExecutor

Understanding the concept of a process

Implementing multiprocessing communication

Using multiprocessing to compute Fibonacci series terms with multiple inputs

Crawling the Web using ProcessPoolExecutor

Utilizing Parallel Python

Utilizing Parallel Python

Understanding interprocess communication

Using PP to calculate the Fibonacci series term on SMP architecture

Using PP to make a distributed Web crawler

Distributing Tasks with Celery

Distributing Tasks with Celery

Understanding Celery

Understanding Celery's architecture

Setting up the environment

Dispatching a simple task

Using Celery to obtain a Fibonacci series term

Defining queues by task types

Using Celery to make a distributed Web crawler

Doing Things Asynchronously

Doing Things Asynchronously

Understanding blocking, nonblocking, and asynchronous operations

Understanding event loop

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Crawling the Web using ProcessPoolExecutor

Just as the concurrent.futures module offers ThreadPoolExecutor, which facilitates the creation and manipulation of multiple threads, processes belong to the class of ProcessPoolExecutor. The ProcessPoolExecutor class, which also featured in the concurrent.futures pack, was used to implement our parallel Web crawler. In order to implement this case study, we have created a Python module named process_pool_executor_web_crawler.py.

The code initiates with the imports known from the previous examples, such as requests, the Manager module, and so on. In relation to the definition of the tasks, and referring to the use of threads, little has changed compared to the example from the previous chapter, except that now we send data to be manipulated by means of function arguments; refer to the following signatures:

The group_urls_task function is defined as follows:

def group_urls_task(urls, result_dict, html_link_regex)

The crawl_task function is defined as...