Book Image

Python High Performance, Second Edition - Second Edition

By : Dr. Gabriele Lanaro
Book Image

Python High Performance, Second Edition - Second Edition

By: Dr. Gabriele Lanaro

Overview of this book

Python is a versatile language that has found applications in many industries. The clean syntax, rich standard library, and vast selection of third-party libraries make Python a wildly popular language. Python High Performance is a practical guide that shows how to leverage the power of both native and third-party Python libraries to build robust applications. The book explains how to use various profilers to find performance bottlenecks and apply the correct algorithm to fix them. The reader will learn how to effectively use NumPy and Cython to speed up numerical code. The book explains concepts of concurrent programming and how to implement robust and responsive applications using Reactive programming. Readers will learn how to write code for parallel architectures using Tensorflow and Theano, and use a cluster of computers for large-scale computations using technologies such as Dask and PySpark. By the end of the book, readers will have learned to achieve performance and scale from their Python applications.
Table of Contents (10 chapters)

Dask

Dask is a project of Continuum Analytics (the same company that's responsible for Numba and the conda package manager) and a pure Python library for parallel and distributed computation. It excels at performing data analysis tasks and is very well integrated in the Python ecosystem.

Dask was initially conceived as a package for bigger-than-memory calculations on a single machine. Recently, with the Dask Distributed project, its code has been adapted to execute tasks on a cluster with excellent performance and fault-tolerance capabilities. It supports MapReduce-style tasks as well as complex numerical algorithms.

Directed Acyclic Graphs

The idea behind Dask is quite similar to what we already saw in the last chapter with Theano and Tensorflow. We can use...