Book Image

Distributed Computing with Python

Book Image

Distributed Computing with Python

Overview of this book

CPU-intensive data processing tasks have become crucial considering the complexity of the various big data applications that are used today. Reducing the CPU utilization per process is very important to improve the overall speed of applications. This book will teach you how to perform parallel execution of computations by distributing them across multiple processors in a single machine, thus improving the overall performance of a big data processing task. We will cover synchronous and asynchronous models, shared memory and file systems, communication between various processes, synchronization, and more.
Table of Contents (15 chapters)
Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Your typical HPC cluster


HPC systems come in all shapes and sizes; however, they tend to share a few common characteristics. They tend to be homogeneous, with a large number of identical, rack-mounted computers located in the same room and connected with very fast networking. Sometimes, especially between upgrades, an HPC cluster might be split into two processor architectures. In those cases, particular care needs to be taken in scheduling our code if those differences have important performance implications.

Most of the computers in a cluster (called compute nodes) run exactly the same operating system and the same set of software packages and are dedicated to exclusively run computations, as the name implies. Users are not typically allowed to use these machines directly.

A smaller number of nodes are special in that they are usually not as powerful as the compute nodes but do allow users to log in. They are called service nodes (or login nodes or head nodes) and are dedicated to run user...