Learning Concurrency in Python

Learning Concurrency in Python

By : Elliot Forbes

Buy this Book

Learning Concurrency in Python

By: Elliot Forbes

Buy this Book

Overview of this book

Python is a very high level, general purpose language that is utilized heavily in fields such as data science and research, as well as being one of the top choices for general purpose programming for programmers around the world. It features a wide number of powerful, high and low-level libraries and frameworks that complement its delightful syntax and enable Python programmers to create. This book introduces some of the most popular libraries and frameworks and goes in-depth into how you can leverage these libraries for your own high-concurrent, highly-performant Python programs. We'll cover the fundamental concepts of concurrency needed to be able to write your own concurrent and parallel software systems in Python. The book will guide you down the path to mastering Python concurrency, giving you all the necessary hardware and theoretical knowledge. We'll cover concepts such as debugging and exception handling as well as some of the most popular libraries and frameworks that allow you to create event-driven and reactive systems. By the end of the book, you'll have learned the techniques to write incredibly efficient concurrent systems that follow best practices.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Speed It Up!

History of concurrency

Threads and multithreading

The limitations of Python

Concurrent image download

Improving number crunching with multiprocessing

Summary

Parallelize It

Understanding concurrency

I/O bottlenecks

Understanding parallelism

How do they work on a CPU?

System architecture styles

Computer memory architecture styles

Summary

Life of a Thread

Threads in Python

Handling threads in Python

How does the operating system handle threads

Multithreading models

Summary

Synchronization between Threads

Synchronization between threads

Shared resources and data races

Summary

Communication between Threads

Standard data structures

Defining your own thread-safe communication structures

Summary

Debug and Benchmark

Summary

Executors and Pools

Concurrent futures

Future objects

ProcessPoolExecutor

Improving our crawler

concurrent.futures in Python 2.7

Summary

Multiprocessing

Working around the GIL

The life of a process

Multiprocessing pools

Communication between processes

Multiprocessing managers

Communicating sequential processes

Summary

Event-Driven Programming

Event-driven programming

Asyncio

Debugging asyncio programs

Twisted

Gevent

Summary

Reactive Programming

Basic reactive programming

ReactiveX, or RX

PyFunctional

Summary

Using the GPU

Introduction to GPUs

Why use the GPU?

CUDA

PyCUDA

Numba

Accelerate

Theano

PyOpenCL

Summary

Choosing a Solution

Libraries not covered in this book

Designing your systems

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Concurrent image download

One excellent example of the benefits of multithreading is, without a doubt, the use of multiple threads to download multiple images or files. This is, actually, one of the best use cases for multithreading due to the blocking nature of I/O.

To highlight the performance gains, we are going to retrieve 10 different images from http://lorempixel.com/400/200/sports, which is a free API that delivers a different image every time you hit that link. We'll then store these 10 different images within a temp folder so that we can view/use them later on.

All the code used in these examples can be found in my GitHub repository here: https://github.com/elliotforbes/Concurrency-With-Python.

Sequential download

First, we should have some form of a baseline against which we can measure the performance gains. To do this, we'll write a quick program that will download these 10 images sequentially, as follows:

import urllib.request
def downloadImage(imagePath, fileName):
print("Downloading Image from ", imagePath)
urllib.request.urlretrieve(imagePath, fileName)
def main():
for i in range(10):
imageName = "temp/image-" + str(i) + ".jpg"
downloadImage("http://lorempixel.com/400/200/sports", imageName)

if __name__ == '__main__':
main()

Breaking it down

In the preceding code, we begin by importing urllib.request. This will act as our medium for performing HTTP requests for the images that we want. We then define a new function called downloadImage, which takes in two parameters, imagePath and fileName. imagePath represents the URL image path that we wish to download. fileName represents the name of the file that we wish to use to save this image locally.

In the main function, we then start up a for loop. Within this for loop, we generate an imageName which includes the temp/ directory, a string representation of what iteration we are currently at--str(i)--and the file extension .jpg. We then call the downloadImage function, passing in the lorempixel location, which provides us with a random image as well as our newly generated imageName.

Upon running this script, you should see your temp directory sequentially fill up with 10 distinct images.

Concurrent download

Now that we have our baseline, it's time to write a quick program that will concurrently download all the images that we require. We'll be going over creating and starting threads in future chapters, so don't worry if you struggle to understand the code. The key point of this is to realize the potential performance gains to be had by writing programs concurrently:

import threading
import urllib.request
import time
def downloadImage(imagePath, fileName):
print("Downloading Image from ", imagePath)
urllib.request.urlretrieve(imagePath, fileName)
print("Completed Download")
def executeThread(i): 
imageName = "temp/image-" + str(i) + ".jpg"
downloadImage("http://lorempixel.com/400/200/sports", imageName)
def main():
t0 = time.time()
# create an array which will store a reference to
# all of our threads
threads = []
# create 10 threads, append them to our array of threads
# and start them off
for i in range(10):
thread = threading.Thread(target=executeThread, args=(i,))
threads.append(thread)
thread.start()

# ensure that all the threads in our array have completed
# their execution before we log the total time to complete
for i in threads:
i.join()
# calculate the total execution time
t1 = time.time()
totalTime = t1 - t0
print("Total Execution Time {}".format(totalTime))
if __name__ == '__main__':
main()

Breaking it down

In the first line of our newly modified program, you should see that we are now importing the threading module; this will enable us to create our first multithreaded application. We then abstract our filename generation, and call the downloadImage function into our own executeThread function.

Within the main function, we first create an empty array of threads, and then iterate 10 times, creating a new thread object, appending this to our array of threads, and then starting that thread.

Finally, we iterate through our array of threads by calling for i in threads, and call the join method on each of these threads. This ensures that we do not proceed with the execution of our remaining code until all of our threads have finished downloading the image.

If you execute this on your machine, you should see that it almost instantaneously starts the download of the 10 different images. When the downloads finish, it again prints out that it has successfully completed, and you should see the temp folder being populated with these images.

Both the preceding scripts do exactly the same tasks using the exact same urllib.request library, but if you take a look at the total execution time, then you should see an order of magnitude improvement on the time taken for the concurrent script to fetch all 10 images.

Learning Concurrency in Python

By : Elliot Forbes

Learning Concurrency in Python

By: Elliot Forbes

Overview of this book

Related Content you might be interested in

Current Title:

Learning Concurrency in Python

Python Parallel Programming Cookbook

Mastering Concurrency in Python

Advanced Python Programming

Concurrent image download

Sequential download

Breaking it down

Concurrent download

Breaking it down