Book Image

Mastering IPython 4.0

By : Thomas Bitterman, Dipanjan Deb
Book Image

Mastering IPython 4.0

By: Thomas Bitterman, Dipanjan Deb

Overview of this book

IPython is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. This book will get IPython developers up to date with the latest advancements in IPython and dive deep into interactive computing with IPython. This an advanced guide on interactive and parallel computing with IPython will explore advanced visualizations and high-performance computing with IPython in detail. You will quickly brush up your knowledge of IPython kernels and wrapper kernels, then we'?ll move to advanced concepts such as testing, Sphinx, JS events, interactive work, and the ZMQ cluster. The book will cover topics such as IPython Console Lexer, advanced configuration, and third-party tools. By the end of this book, you will be able to use IPython for interactive and parallel computing in a high-performance computing environment.
Table of Contents (18 chapters)
Mastering IPython 4.0
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
6
Works Well with Others – IPython and Third-Party Tools
Index

High Performance Computing


At this point, we have to leave consumer computing aside for a while. As computing hardware became more affordable, the need for most people to have programs run as efficiently as possible diminished. Other criteria entered the picture: graphical interfaces, multitasking, interactivity, and so on. Usability became more important than raw speed.

This, however, was not true for everybody. There remained a small (but devoted) group of users/programmers for whom efficiency was not just the most important thing. It was the only thing. These groups hung out in nuclear labs and intelligence agencies and had money to spend on exotic hardware and highly skilled coders. Thus was shaped High Performance Computing (HPC).

True to the nature of HPC, its implementations have been chosen with efficiency in mind. HPC systems are highly parallel, are batch type, and run Fortran. It is important enough to the users of HPC systems that their programs run quickly, so much so that they have ignored any and all advances in the field which did not result faster programs.

The HPC learning curve

This was a satisfactory relationship for some time. The types of problems of interest to the HPC community (complicated physical modeling and advanced mathematics) had little overlap with the rest of computer science. HPC was a niche with a very high barrier to entry. After all, there were just not that many massively parallel computers to go around.

In a sense then, programming HPC systems was an island. On the island, there were ongoing research programs centered on important HPC-centric questions. Tools were built, skills were developed, and a community of practice developed to the point that approaching HPC from the outside could be daunting. Advances occurred outside of HPC also, but those inside it had their own concerns.

As time passed, the HPC island drifted further and further from mainstream computing. New areas opened up: web computing, mobile computing, agile methods, and many others. HPC took what it needed from these areas, but nothing really affected it. Until something finally did…

Cloudy with a chance of parallelism (or Amazon's computer is bigger than yours)

Amazon had a problem. During the Christmas season, it used a lot of computer power. For the rest of the year, these computers would sit idle. If there were some way to allow people to rent time on these idle machines, Amazon could make money. The result was an API that allowed people to store data on those machines (the Amazon Simple Storage Service, or S3) and an API that allowed people to run programs on the same machines (the Amazon Elastic Compute Cloud, or EC2). Together, these made up the start of the Amazon Cloud.

While not the first system to rent out excess capacity (CompuServe started off the same way several decades earlier), Amazon Cloud was the first large-scale system that provided the general public paid access to virtually unlimited storage and computing power.

It is not clear whether anybody realized what this meant at first. There are a lot of uses of clouds—overflow capacity, mass data storage, and redundancy, among others—that have a wide appeal. For our purposes, the cloud meant one thing: now everybody has access to a supercomputer. HPC will never be the same again.

HPC and parallelism

The current relationship between HPC and highly parallel architectures is relatively new. It was only in the 1990s that HPC left the realm of very fast single-processor machines for massively parallel architectures. In one sense, this was unfortunate, as the old Cray machines were aesthetic marvels:

The image is taken from a public domain: https://commons.wikimedia.org/wiki/File:Cray2.jpeg

It was largely inevitable, however, as single-processor systems were bumping up against physical limitations involving transistor density and cooling.

The change in architecture did not bring with it a change in the problems to be solved. To this end, the generic supercomputer physical architecture evolved toward:

  • Commodity processors—not custom-fast but top-of-the-line and homogeneous

  • Commodity RAM—ditto

  • High-end hard drives—lots of smaller, low-latency models (now turning into solid state drives)

  • Super-fast interconnected networks

Moving from single to multiple processors brought issues with locality. Every time a program running on one processor needed data from another processor (or disk), processing could come to a halt as the data was being retrieved. The physical architecture of the supercomputer is meant to minimize the latency associated with non-local data access.

Given the position of HPC centers as early adopters of parallel architectures, "parallel programming" came to be largely synonymous with "HPC programming." This is largely a historical accident, and new paradigms have opened up parallel computing to constituencies outside of the HPC world. As such, this book will use the two terms interchangeably.

We now turn to one of the new paradigms, cloud computing, and discuss its similarities and differences from standard HPC.

Clouds and HPC

There are some differences between a "real" supercomputer and what most clouds offer. In particular, a cloud's physical architecture will contain:

  • Commodity processors—not necessarily fast, but they make up for it in sheer numbers

  • Commodity RAM—ditto

  • Commodity hard drives—smaller, but larger in aggregate

  • Slow(er) interconnected networks

In addition, clouds are generally heterogeneous and easily scaled. While an initial cloud is likely to have many subsystems with the same processor, RAM, hard drives, and so on, over time new subsystems will be added, with newer (or at least different) technology. The loose coupling of cloud systems encourages this sort of organic growth.

Differences in architecture mean that some algorithms will run well on supercomputers versus others that favor clouds. A lot of software that runs on supercomputers will not run on clouds; period (and vice versa)! This is not always just a matter of recompiling for a new target platform or using different libraries. The underlying algorithm may not be suited for a particular paradigm.

If speed is imperative and you have the budget, there is still no substitute for a special-purpose HPC system. If cost, ease of access, redundancy, and massive parallelism are desired, a cloud fits the bill.

That is not to say the two worlds (HPC and cloud) are completely distinct. Despite these architectural differences, it is worth noting that an Amazon EC2 C3 instance cluster is listed at 134 on the top 500 list of fastest HPC systems as of June 2015. Even on HPC's own terms, cloud computers offer respectable performance.

The core audience for this book then consists of members of both of these groups:

  • Python programmers looking to expand into HPC/parallel-style programming

  • HPC/parallel programmers looking to employ Python

Each group has the skills the other wants. HPC programmers understand scientific computing, efficiency, and parallelism. Python programmers are skilled in interactivity, usability, correctness, powerful development tools, ease of debugging, and other capabilities that mainstream computing values. New technology means that future systems will need to incorporate elements from both skill sets.